Cargando…

Resolving repeat families with long reads

BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal a...

Descripción completa

Detalles Bibliográficos
Autor principal: Bongartz, Philipp
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506941/
https://www.ncbi.nlm.nih.gov/pubmed/31072311
http://dx.doi.org/10.1186/s12859-019-2807-4
_version_ 1783416938907566080
author Bongartz, Philipp
author_facet Bongartz, Philipp
author_sort Bongartz, Philipp
collection PubMed
description BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. RESULTS: We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. CONCLUSIONS: Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6506941
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65069412019-05-13 Resolving repeat families with long reads Bongartz, Philipp BMC Bioinformatics Methodology Article BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. RESULTS: We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. CONCLUSIONS: Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-09 /pmc/articles/PMC6506941/ /pubmed/31072311 http://dx.doi.org/10.1186/s12859-019-2807-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bongartz, Philipp
Resolving repeat families with long reads
title Resolving repeat families with long reads
title_full Resolving repeat families with long reads
title_fullStr Resolving repeat families with long reads
title_full_unstemmed Resolving repeat families with long reads
title_short Resolving repeat families with long reads
title_sort resolving repeat families with long reads
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506941/
https://www.ncbi.nlm.nih.gov/pubmed/31072311
http://dx.doi.org/10.1186/s12859-019-2807-4
work_keys_str_mv AT bongartzphilipp resolvingrepeatfamilieswithlongreads