Cargando…

RepARK—de novo creation of repeat libraries from whole-genome NGS reads

Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat lib...

Descripción completa

Detalles Bibliográficos
Autores principales: Koch, Philipp, Platzer, Matthias, Downie, Bryan R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4027187/
https://www.ncbi.nlm.nih.gov/pubmed/24634442
http://dx.doi.org/10.1093/nar/gku210
_version_ 1782316962424029184
author Koch, Philipp
Platzer, Matthias
Downie, Bryan R.
author_facet Koch, Philipp
Platzer, Matthias
Downie, Bryan R.
author_sort Koch, Philipp
collection PubMed
description Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method—RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)—which avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.
format Online
Article
Text
id pubmed-4027187
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40271872014-05-28 RepARK—de novo creation of repeat libraries from whole-genome NGS reads Koch, Philipp Platzer, Matthias Downie, Bryan R. Nucleic Acids Res Methods Online Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method—RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)—which avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets. Oxford University Press 2014-05-01 2014-03-14 /pmc/articles/PMC4027187/ /pubmed/24634442 http://dx.doi.org/10.1093/nar/gku210 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Koch, Philipp
Platzer, Matthias
Downie, Bryan R.
RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title_full RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title_fullStr RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title_full_unstemmed RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title_short RepARK—de novo creation of repeat libraries from whole-genome NGS reads
title_sort repark—de novo creation of repeat libraries from whole-genome ngs reads
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4027187/
https://www.ncbi.nlm.nih.gov/pubmed/24634442
http://dx.doi.org/10.1093/nar/gku210
work_keys_str_mv AT kochphilipp reparkdenovocreationofrepeatlibrariesfromwholegenomengsreads
AT platzermatthias reparkdenovocreationofrepeatlibrariesfromwholegenomengsreads
AT downiebryanr reparkdenovocreationofrepeatlibrariesfromwholegenomengsreads