Cargando…

Deep repeat resolution—the assembly of the Drosophila Histone Complex

Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive...

Descripción completa

Detalles Bibliográficos
Autores principales: Bongartz, Philipp, Schloissnig, Siegfried
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380962/
https://www.ncbi.nlm.nih.gov/pubmed/30476267
http://dx.doi.org/10.1093/nar/gky1194
_version_ 1783396389517000704
author Bongartz, Philipp
Schloissnig, Siegfried
author_facet Bongartz, Philipp
Schloissnig, Siegfried
author_sort Bongartz, Philipp
collection PubMed
description Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes.
format Online
Article
Text
id pubmed-6380962
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63809622019-02-22 Deep repeat resolution—the assembly of the Drosophila Histone Complex Bongartz, Philipp Schloissnig, Siegfried Nucleic Acids Res Methods Online Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes. Oxford University Press 2019-02-20 2018-11-26 /pmc/articles/PMC6380962/ /pubmed/30476267 http://dx.doi.org/10.1093/nar/gky1194 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Bongartz, Philipp
Schloissnig, Siegfried
Deep repeat resolution—the assembly of the Drosophila Histone Complex
title Deep repeat resolution—the assembly of the Drosophila Histone Complex
title_full Deep repeat resolution—the assembly of the Drosophila Histone Complex
title_fullStr Deep repeat resolution—the assembly of the Drosophila Histone Complex
title_full_unstemmed Deep repeat resolution—the assembly of the Drosophila Histone Complex
title_short Deep repeat resolution—the assembly of the Drosophila Histone Complex
title_sort deep repeat resolution—the assembly of the drosophila histone complex
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380962/
https://www.ncbi.nlm.nih.gov/pubmed/30476267
http://dx.doi.org/10.1093/nar/gky1194
work_keys_str_mv AT bongartzphilipp deeprepeatresolutiontheassemblyofthedrosophilahistonecomplex
AT schloissnigsiegfried deeprepeatresolutiontheassemblyofthedrosophilahistonecomplex