Cargando…

Chromosome assembly of large and complex genomes using multiple references

Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large...

Descripción completa

Detalles Bibliográficos
Autores principales: Kolmogorov, Mikhail, Armstrong, Joel, Raney, Brian J., Streeter, Ian, Dunn, Matthew, Yang, Fengtang, Odom, Duncan, Flicek, Paul, Keane, Thomas M., Thybert, David, Paten, Benedict, Pham, Son
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643/
https://www.ncbi.nlm.nih.gov/pubmed/30341161
http://dx.doi.org/10.1101/gr.236273.118
_version_ 1783367375995797504
author Kolmogorov, Mikhail
Armstrong, Joel
Raney, Brian J.
Streeter, Ian
Dunn, Matthew
Yang, Fengtang
Odom, Duncan
Flicek, Paul
Keane, Thomas M.
Thybert, David
Paten, Benedict
Pham, Son
author_facet Kolmogorov, Mikhail
Armstrong, Joel
Raney, Brian J.
Streeter, Ian
Dunn, Matthew
Yang, Fengtang
Odom, Duncan
Flicek, Paul
Keane, Thomas M.
Thybert, David
Paten, Benedict
Pham, Son
author_sort Kolmogorov, Mikhail
collection PubMed
description Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.
format Online
Article
Text
id pubmed-6211643
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-62116432018-11-13 Chromosome assembly of large and complex genomes using multiple references Kolmogorov, Mikhail Armstrong, Joel Raney, Brian J. Streeter, Ian Dunn, Matthew Yang, Fengtang Odom, Duncan Flicek, Paul Keane, Thomas M. Thybert, David Paten, Benedict Pham, Son Genome Res Method Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH. Cold Spring Harbor Laboratory Press 2018-11 /pmc/articles/PMC6211643/ /pubmed/30341161 http://dx.doi.org/10.1101/gr.236273.118 Text en © 2018 Kolmogorov et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Method
Kolmogorov, Mikhail
Armstrong, Joel
Raney, Brian J.
Streeter, Ian
Dunn, Matthew
Yang, Fengtang
Odom, Duncan
Flicek, Paul
Keane, Thomas M.
Thybert, David
Paten, Benedict
Pham, Son
Chromosome assembly of large and complex genomes using multiple references
title Chromosome assembly of large and complex genomes using multiple references
title_full Chromosome assembly of large and complex genomes using multiple references
title_fullStr Chromosome assembly of large and complex genomes using multiple references
title_full_unstemmed Chromosome assembly of large and complex genomes using multiple references
title_short Chromosome assembly of large and complex genomes using multiple references
title_sort chromosome assembly of large and complex genomes using multiple references
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211643/
https://www.ncbi.nlm.nih.gov/pubmed/30341161
http://dx.doi.org/10.1101/gr.236273.118
work_keys_str_mv AT kolmogorovmikhail chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT armstrongjoel chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT raneybrianj chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT streeterian chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT dunnmatthew chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT yangfengtang chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT odomduncan chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT flicekpaul chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT keanethomasm chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT thybertdavid chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT patenbenedict chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences
AT phamson chromosomeassemblyoflargeandcomplexgenomesusingmultiplereferences