Cargando…
Deep repeat resolution—the assembly of the Drosophila Histone Complex
Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380962/ https://www.ncbi.nlm.nih.gov/pubmed/30476267 http://dx.doi.org/10.1093/nar/gky1194 |
_version_ | 1783396389517000704 |
---|---|
author | Bongartz, Philipp Schloissnig, Siegfried |
author_facet | Bongartz, Philipp Schloissnig, Siegfried |
author_sort | Bongartz, Philipp |
collection | PubMed |
description | Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes. |
format | Online Article Text |
id | pubmed-6380962 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63809622019-02-22 Deep repeat resolution—the assembly of the Drosophila Histone Complex Bongartz, Philipp Schloissnig, Siegfried Nucleic Acids Res Methods Online Though the advent of long-read sequencing technologies has led to a leap in contiguity of de novo genome assemblies, current reference genomes of higher organisms still do not provide unbroken sequences of complete chromosomes. Despite reads in excess of 30 000 base pairs, there are still repetitive structures that cannot be resolved by current state-of-the-art assemblers. The most challenging of these structures are tandemly arrayed repeats, which occur in the genomes of all eukaryotes. Untangling tandem repeat clusters is exceptionally difficult, since the rare differences between repeat copies are obscured by the high error rate of long reads. Solving this problem would constitute a major step towards computing fully assembled genomes. Here, we demonstrate by example of the Drosophila Histone Complex that via machine learning algorithms, it is possible to exploit the underlying distinguishing patterns of single nucleotide variants of repeats from very noisy data to resolve a large and highly conserved repeat cluster. The ideas explored in this paper are a first step towards the automated assembly of complex repeat structures and promise to be applicable to a wide range of eukaryotic genomes. Oxford University Press 2019-02-20 2018-11-26 /pmc/articles/PMC6380962/ /pubmed/30476267 http://dx.doi.org/10.1093/nar/gky1194 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Bongartz, Philipp Schloissnig, Siegfried Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title | Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title_full | Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title_fullStr | Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title_full_unstemmed | Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title_short | Deep repeat resolution—the assembly of the Drosophila Histone Complex |
title_sort | deep repeat resolution—the assembly of the drosophila histone complex |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380962/ https://www.ncbi.nlm.nih.gov/pubmed/30476267 http://dx.doi.org/10.1093/nar/gky1194 |
work_keys_str_mv | AT bongartzphilipp deeprepeatresolutiontheassemblyofthedrosophilahistonecomplex AT schloissnigsiegfried deeprepeatresolutiontheassemblyofthedrosophilahistonecomplex |