Cargando…
Resolving repeat families with long reads
BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal a...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506941/ https://www.ncbi.nlm.nih.gov/pubmed/31072311 http://dx.doi.org/10.1186/s12859-019-2807-4 |
_version_ | 1783416938907566080 |
---|---|
author | Bongartz, Philipp |
author_facet | Bongartz, Philipp |
author_sort | Bongartz, Philipp |
collection | PubMed |
description | BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. RESULTS: We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. CONCLUSIONS: Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6506941 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65069412019-05-13 Resolving repeat families with long reads Bongartz, Philipp BMC Bioinformatics Methodology Article BACKGROUND: Draft quality genomes for a multitude of organisms have become common due to the advancement of genome assemblers using long-read technologies with high error rates. Although current assemblies are substantially more contiguous than assemblies based on short reads, complete chromosomal assemblies are still challenging. Interspersed repeat families with multiple copy versions dominate the contig and scaffold ends of current long-read assemblies for complex genomes. These repeat families generally remain unresolved, as existing algorithmic solutions either do not scale to large copy numbers or can not handle the current high read error rates. RESULTS: We propose novel repeat resolution methods for large interspersed repeat families and assess their accuracy on simulated data sets with various distinct repeat structures and on drosophila melanogaster transposons. Additionally, we compare our methods to an existing long read repeat resolution tool and show the improved accuracy of our method. CONCLUSIONS: Our results demonstrate the applicability of our methods for the improvement of the contiguity of genome assemblies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2807-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-09 /pmc/articles/PMC6506941/ /pubmed/31072311 http://dx.doi.org/10.1186/s12859-019-2807-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Bongartz, Philipp Resolving repeat families with long reads |
title | Resolving repeat families with long reads |
title_full | Resolving repeat families with long reads |
title_fullStr | Resolving repeat families with long reads |
title_full_unstemmed | Resolving repeat families with long reads |
title_short | Resolving repeat families with long reads |
title_sort | resolving repeat families with long reads |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6506941/ https://www.ncbi.nlm.nih.gov/pubmed/31072311 http://dx.doi.org/10.1186/s12859-019-2807-4 |
work_keys_str_mv | AT bongartzphilipp resolvingrepeatfamilieswithlongreads |