Cargando…
Tigmint: correcting assembly errors using linked reads from large molecules
BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the und...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204047/ https://www.ncbi.nlm.nih.gov/pubmed/30367597 http://dx.doi.org/10.1186/s12859-018-2425-6 |
_version_ | 1783365988922687488 |
---|---|
author | Jackman, Shaun D. Coombe, Lauren Chu, Justin Warren, Rene L. Vandervalk, Benjamin P. Yeo, Sarah Xue, Zhuyi Mohamadi, Hamid Bohlmann, Joerg Jones, Steven J.M. Birol, Inanc |
author_facet | Jackman, Shaun D. Coombe, Lauren Chu, Justin Warren, Rene L. Vandervalk, Benjamin P. Yeo, Sarah Xue, Zhuyi Mohamadi, Hamid Bohlmann, Joerg Jones, Steven J.M. Birol, Inanc |
author_sort | Jackman, Shaun D. |
collection | PubMed |
description | BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap. RESULTS: To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. CONCLUSIONS: Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone. |
format | Online Article Text |
id | pubmed-6204047 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62040472018-11-01 Tigmint: correcting assembly errors using linked reads from large molecules Jackman, Shaun D. Coombe, Lauren Chu, Justin Warren, Rene L. Vandervalk, Benjamin P. Yeo, Sarah Xue, Zhuyi Mohamadi, Hamid Bohlmann, Joerg Jones, Steven J.M. Birol, Inanc BMC Bioinformatics Software BACKGROUND: Genome sequencing yields the sequence of many short snippets of DNA (reads) from a genome. Genome assembly attempts to reconstruct the original genome from which these reads were derived. This task is difficult due to gaps and errors in the sequencing data, repetitive sequence in the underlying genome, and heterozygosity. As a result, assembly errors are common. In the absence of a reference genome, these misassemblies may be identified by comparing the sequencing data to the assembly and looking for discrepancies between the two. Once identified, these misassemblies may be corrected, improving the quality of the assembled sequence. Although tools exist to identify and correct misassemblies using Illumina paired-end and mate-pair sequencing, no such tool yet exists that makes use of the long distance information of the large molecules provided by linked reads, such as those offered by the 10x Genomics Chromium platform. We have developed the tool Tigmint to address this gap. RESULTS: To demonstrate the effectiveness of Tigmint, we applied it to assemblies of a human genome using short reads assembled with ABySS 2.0 and other assemblers. Tigmint reduced the number of misassemblies identified by QUAST in the ABySS assembly by 216 (27%). While scaffolding with ARCS alone more than doubled the scaffold NGA50 of the assembly from 3 to 8 Mbp, the combination of Tigmint and ARCS improved the scaffold NGA50 of the assembly over five-fold to 16.4 Mbp. This notable improvement in contiguity highlights the utility of assembly correction in refining assemblies. We demonstrate the utility of Tigmint in correcting the assemblies of multiple tools, as well as in using Chromium reads to correct and scaffold assemblies of long single-molecule sequencing. CONCLUSIONS: Scaffolding an assembly that has been corrected with Tigmint yields a final assembly that is both more correct and substantially more contiguous than an assembly that has not been corrected. Using single-molecule sequencing in combination with linked reads enables a genome sequence assembly that achieves both a high sequence contiguity as well as high scaffold contiguity, a feat not currently achievable with either technology alone. BioMed Central 2018-10-26 /pmc/articles/PMC6204047/ /pubmed/30367597 http://dx.doi.org/10.1186/s12859-018-2425-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Jackman, Shaun D. Coombe, Lauren Chu, Justin Warren, Rene L. Vandervalk, Benjamin P. Yeo, Sarah Xue, Zhuyi Mohamadi, Hamid Bohlmann, Joerg Jones, Steven J.M. Birol, Inanc Tigmint: correcting assembly errors using linked reads from large molecules |
title | Tigmint: correcting assembly errors using linked reads from large molecules |
title_full | Tigmint: correcting assembly errors using linked reads from large molecules |
title_fullStr | Tigmint: correcting assembly errors using linked reads from large molecules |
title_full_unstemmed | Tigmint: correcting assembly errors using linked reads from large molecules |
title_short | Tigmint: correcting assembly errors using linked reads from large molecules |
title_sort | tigmint: correcting assembly errors using linked reads from large molecules |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6204047/ https://www.ncbi.nlm.nih.gov/pubmed/30367597 http://dx.doi.org/10.1186/s12859-018-2425-6 |
work_keys_str_mv | AT jackmanshaund tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT coombelauren tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT chujustin tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT warrenrenel tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT vandervalkbenjaminp tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT yeosarah tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT xuezhuyi tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT mohamadihamid tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT bohlmannjoerg tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT jonesstevenjm tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules AT birolinanc tigmintcorrectingassemblyerrorsusinglinkedreadsfromlargemolecules |