Cargando…

Tigmint: correcting assembly errors using linked reads from large molecules

BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the und...

Descripción completa

Detalles Bibliográficos
Autores principales: Jackman, Shaun D., Coombe, Lauren, Chu, Justin, Warren, Rene L., Vandervalk, Benjamin P., Yeo, Sarah, Xue, Zhuyi, Mohamadi, Hamid, Bohlmann, Joerg, Jones, Steven J.M., Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204047/
https://www.ncbi.nlm.nih.gov/pubmed/30367597
http://dx.doi.org/10.1186/s12859-018-2425-6
_version_ 1783365988922687488
author Jackman, Shaun D.
Coombe, Lauren
Chu, Justin
Warren, Rene L.
Vandervalk, Benjamin P.
Yeo, Sarah
Xue, Zhuyi
Mohamadi, Hamid
Bohlmann, Joerg
Jones, Steven J.M.
Birol, Inanc
author_facet Jackman, Shaun D.
Coombe, Lauren
Chu, Justin
Warren, Rene L.
Vandervalk, Benjamin P.
Yeo, Sarah
Xue, Zhuyi
Mohamadi, Hamid
Bohlmann, Joerg
Jones, Steven J.M.
Birol, Inanc
author_sort Jackman, Shaun D.
collection PubMed
description BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap. RESULTS: To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. CONCLUSIONS: Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone.
format Online
Article
Text
id pubmed-6204047
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62040472018-11-01 Tigmint: correcting assembly errors using linked reads from large molecules Jackman, Shaun D. Coombe, Lauren Chu, Justin Warren, Rene L. Vandervalk, Benjamin P. Yeo, Sarah Xue, Zhuyi Mohamadi, Hamid Bohlmann, Joerg Jones, Steven J.M. Birol, Inanc BMC Bioinformatics Software BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap. RESULTS: To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. CONCLUSIONS: Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone. BioMed Central 2018-10-26 /pmc/articles/PMC6204047/ /pubmed/30367597 http://dx.doi.org/10.1186/s12859-018-2425-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Jackman, Shaun D.
Coombe, Lauren
Chu, Justin
Warren, Rene L.
Vandervalk, Benjamin P.
Yeo, Sarah
Xue, Zhuyi
Mohamadi, Hamid
Bohlmann, Joerg
Jones, Steven J.M.
Birol, Inanc
Tigmint: correcting assembly errors using linked reads from large molecules
title Tigmint: correcting assembly errors using linked reads from large molecules
title_full Tigmint: correcting assembly errors using linked reads from large molecules
title_fullStr Tigmint: correcting assembly errors using linked reads from large molecules
title_full_unstemmed Tigmint: correcting assembly errors using linked reads from large molecules
title_short Tigmint: correcting assembly errors using linked reads from large molecules
title_sort tigmint: correcting assembly errors using linked reads from large molecules
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204047/
https://www.ncbi.nlm.nih.gov/pubmed/30367597
http://dx.doi.org/10.1186/s12859-018-2425-6
work_keys_str_mv AT jackmanshaund tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT coombelauren tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT chujustin tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT warrenrenel tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT vandervalkbenjaminp tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT yeosarah tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT xuezhuyi tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT mohamadihamid tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT bohlmannjoerg tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT jonesstevenjm tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules
AT birolinanc tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules