Cargando…

Resolving repeat families with long reads

BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal a...

Descripción completa

Detalles Bibliográficos
Autor principal: Bongartz, Philipp
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506941/
https://www.ncbi.nlm.nih.gov/pubmed/31072311
http://dx.doi.org/10.1186/s12859-019-2807-4
Descripción
Sumario:BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. RESULTS: We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. CONCLUSIONS: Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users.