Cargando…

Dfam: a database of repetitive DNA based on profile hidden Markov models

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on...

Descripción completa

Detalles Bibliográficos
Autores principales: Wheeler, Travis J., Clements, Jody, Eddy, Sean R., Hubley, Robert, Jones, Thomas A., Jurka, Jerzy, Smit, Arian F. A., Finn, Robert D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531169/
https://www.ncbi.nlm.nih.gov/pubmed/23203985
http://dx.doi.org/10.1093/nar/gks1265
_version_ 1782254126250328064
author Wheeler, Travis J.
Clements, Jody
Eddy, Sean R.
Hubley, Robert
Jones, Thomas A.
Jurka, Jerzy
Smit, Arian F. A.
Finn, Robert D.
author_facet Wheeler, Travis J.
Clements, Jody
Eddy, Sean R.
Hubley, Robert
Jones, Thomas A.
Jurka, Jerzy
Smit, Arian F. A.
Finn, Robert D.
author_sort Wheeler, Travis J.
collection PubMed
description We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
format Online
Article
Text
id pubmed-3531169
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35311692013-03-07 Dfam: a database of repetitive DNA based on profile hidden Markov models Wheeler, Travis J. Clements, Jody Eddy, Sean R. Hubley, Robert Jones, Thomas A. Jurka, Jerzy Smit, Arian F. A. Finn, Robert D. Nucleic Acids Res Articles We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps. Oxford University Press 2013-01 2012-11-30 /pmc/articles/PMC3531169/ /pubmed/23203985 http://dx.doi.org/10.1093/nar/gks1265 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle Articles
Wheeler, Travis J.
Clements, Jody
Eddy, Sean R.
Hubley, Robert
Jones, Thomas A.
Jurka, Jerzy
Smit, Arian F. A.
Finn, Robert D.
Dfam: a database of repetitive DNA based on profile hidden Markov models
title Dfam: a database of repetitive DNA based on profile hidden Markov models
title_full Dfam: a database of repetitive DNA based on profile hidden Markov models
title_fullStr Dfam: a database of repetitive DNA based on profile hidden Markov models
title_full_unstemmed Dfam: a database of repetitive DNA based on profile hidden Markov models
title_short Dfam: a database of repetitive DNA based on profile hidden Markov models
title_sort dfam: a database of repetitive dna based on profile hidden markov models
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531169/
https://www.ncbi.nlm.nih.gov/pubmed/23203985
http://dx.doi.org/10.1093/nar/gks1265
work_keys_str_mv AT wheelertravisj dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT clementsjody dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT eddyseanr dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT hubleyrobert dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT jonesthomasa dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT jurkajerzy dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT smitarianfa dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels
AT finnrobertd dfamadatabaseofrepetitivednabasedonprofilehiddenmarkovmodels