Cargando…

Transposon identification using profile HMMs

BACKGROUND: Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Edlefsen, Paul T, Liu, Jun S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822524/
https://www.ncbi.nlm.nih.gov/pubmed/20158867
http://dx.doi.org/10.1186/1471-2164-11-S1-S10
_version_ 1782177532854927360
author Edlefsen, Paul T
Liu, Jun S
author_facet Edlefsen, Paul T
Liu, Jun S
author_sort Edlefsen, Paul T
collection PubMed
description BACKGROUND: Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet. RESULTS: We demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions. CONCLUSIONS: We argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes.
format Text
id pubmed-2822524
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28225242010-02-17 Transposon identification using profile HMMs Edlefsen, Paul T Liu, Jun S BMC Genomics Research BACKGROUND: Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet. RESULTS: We demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions. CONCLUSIONS: We argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes. BioMed Central 2010-02-10 /pmc/articles/PMC2822524/ /pubmed/20158867 http://dx.doi.org/10.1186/1471-2164-11-S1-S10 Text en Copyright ©2010 Edlefsen and Liu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Edlefsen, Paul T
Liu, Jun S
Transposon identification using profile HMMs
title Transposon identification using profile HMMs
title_full Transposon identification using profile HMMs
title_fullStr Transposon identification using profile HMMs
title_full_unstemmed Transposon identification using profile HMMs
title_short Transposon identification using profile HMMs
title_sort transposon identification using profile hmms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822524/
https://www.ncbi.nlm.nih.gov/pubmed/20158867
http://dx.doi.org/10.1186/1471-2164-11-S1-S10
work_keys_str_mv AT edlefsenpault transposonidentificationusingprofilehmms
AT liujuns transposonidentificationusingprofilehmms