Cargando…

The Dfam database of repetitive DNA families

Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The ini...

Descripción completa

Detalles Bibliográficos
Autores principales: Hubley, Robert, Finn, Robert D., Clements, Jody, Eddy, Sean R., Jones, Thomas A., Bao, Weidong, Smit, Arian F.A., Wheeler, Travis J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702899/
https://www.ncbi.nlm.nih.gov/pubmed/26612867
http://dx.doi.org/10.1093/nar/gkv1272
_version_ 1782408675749527552
author Hubley, Robert
Finn, Robert D.
Clements, Jody
Eddy, Sean R.
Jones, Thomas A.
Bao, Weidong
Smit, Arian F.A.
Wheeler, Travis J.
author_facet Hubley, Robert
Finn, Robert D.
Clements, Jody
Eddy, Sean R.
Jones, Thomas A.
Bao, Weidong
Smit, Arian F.A.
Wheeler, Travis J.
author_sort Hubley, Robert
collection PubMed
description Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
format Online
Article
Text
id pubmed-4702899
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47028992016-01-07 The Dfam database of repetitive DNA families Hubley, Robert Finn, Robert D. Clements, Jody Eddy, Sean R. Jones, Thomas A. Bao, Weidong Smit, Arian F.A. Wheeler, Travis J. Nucleic Acids Res Database Issue Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download. Oxford University Press 2016-01-04 2015-11-26 /pmc/articles/PMC4702899/ /pubmed/26612867 http://dx.doi.org/10.1093/nar/gkv1272 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Issue
Hubley, Robert
Finn, Robert D.
Clements, Jody
Eddy, Sean R.
Jones, Thomas A.
Bao, Weidong
Smit, Arian F.A.
Wheeler, Travis J.
The Dfam database of repetitive DNA families
title The Dfam database of repetitive DNA families
title_full The Dfam database of repetitive DNA families
title_fullStr The Dfam database of repetitive DNA families
title_full_unstemmed The Dfam database of repetitive DNA families
title_short The Dfam database of repetitive DNA families
title_sort dfam database of repetitive dna families
topic Database Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702899/
https://www.ncbi.nlm.nih.gov/pubmed/26612867
http://dx.doi.org/10.1093/nar/gkv1272
work_keys_str_mv AT hubleyrobert thedfamdatabaseofrepetitivednafamilies
AT finnrobertd thedfamdatabaseofrepetitivednafamilies
AT clementsjody thedfamdatabaseofrepetitivednafamilies
AT eddyseanr thedfamdatabaseofrepetitivednafamilies
AT jonesthomasa thedfamdatabaseofrepetitivednafamilies
AT baoweidong thedfamdatabaseofrepetitivednafamilies
AT smitarianfa thedfamdatabaseofrepetitivednafamilies
AT wheelertravisj thedfamdatabaseofrepetitivednafamilies
AT hubleyrobert dfamdatabaseofrepetitivednafamilies
AT finnrobertd dfamdatabaseofrepetitivednafamilies
AT clementsjody dfamdatabaseofrepetitivednafamilies
AT eddyseanr dfamdatabaseofrepetitivednafamilies
AT jonesthomasa dfamdatabaseofrepetitivednafamilies
AT baoweidong dfamdatabaseofrepetitivednafamilies
AT smitarianfa dfamdatabaseofrepetitivednafamilies
AT wheelertravisj dfamdatabaseofrepetitivednafamilies