Cargando…

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this prob...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Chong, Nielsen, Rasmus, Wu, Yufeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792456/
https://www.ncbi.nlm.nih.gov/pubmed/26977803
http://dx.doi.org/10.1371/journal.pone.0150719
_version_ 1782421247560253440
author Chu, Chong
Nielsen, Rasmus
Wu, Yufeng
author_facet Chu, Chong
Nielsen, Rasmus
Wu, Yufeng
author_sort Chu, Chong
collection PubMed
description Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.
format Online
Article
Text
id pubmed-4792456
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47924562016-03-23 REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads Chu, Chong Nielsen, Rasmus Wu, Yufeng PLoS One Research Article Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo. Public Library of Science 2016-03-15 /pmc/articles/PMC4792456/ /pubmed/26977803 http://dx.doi.org/10.1371/journal.pone.0150719 Text en © 2016 Chu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chu, Chong
Nielsen, Rasmus
Wu, Yufeng
REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title_full REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title_fullStr REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title_full_unstemmed REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title_short REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
title_sort repdenovo: inferring de novo repeat motifs from short sequence reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792456/
https://www.ncbi.nlm.nih.gov/pubmed/26977803
http://dx.doi.org/10.1371/journal.pone.0150719
work_keys_str_mv AT chuchong repdenovoinferringdenovorepeatmotifsfromshortsequencereads
AT nielsenrasmus repdenovoinferringdenovorepeatmotifsfromshortsequencereads
AT wuyufeng repdenovoinferringdenovorepeatmotifsfromshortsequencereads