Cargando…

Evaluation of the impact of Illumina error correction tools on de novo genome assembly

BACKGROUND: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Heydari, Mahdi, Miclotte, Giles, Demeester, Piet, Van de Peer, Yves, Fostier, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563063/
https://www.ncbi.nlm.nih.gov/pubmed/28821237
http://dx.doi.org/10.1186/s12859-017-1784-8
_version_ 1783258067718111232
author Heydari, Mahdi
Miclotte, Giles
Demeester, Piet
Van de Peer, Yves
Fostier, Jan
author_facet Heydari, Mahdi
Miclotte, Giles
Demeester, Piet
Van de Peer, Yves
Fostier, Jan
author_sort Heydari, Mahdi
collection PubMed
description BACKGROUND: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods. RESULTS: For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy. CONCLUSIONS: We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1784-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5563063
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55630632017-08-21 Evaluation of the impact of Illumina error correction tools on de novo genome assembly Heydari, Mahdi Miclotte, Giles Demeester, Piet Van de Peer, Yves Fostier, Jan BMC Bioinformatics Research Article BACKGROUND: Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods. RESULTS: For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy. CONCLUSIONS: We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1784-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-18 /pmc/articles/PMC5563063/ /pubmed/28821237 http://dx.doi.org/10.1186/s12859-017-1784-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Heydari, Mahdi
Miclotte, Giles
Demeester, Piet
Van de Peer, Yves
Fostier, Jan
Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title_full Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title_fullStr Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title_full_unstemmed Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title_short Evaluation of the impact of Illumina error correction tools on de novo genome assembly
title_sort evaluation of the impact of illumina error correction tools on de novo genome assembly
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5563063/
https://www.ncbi.nlm.nih.gov/pubmed/28821237
http://dx.doi.org/10.1186/s12859-017-1784-8
work_keys_str_mv AT heydarimahdi evaluationoftheimpactofilluminaerrorcorrectiontoolsondenovogenomeassembly
AT miclottegiles evaluationoftheimpactofilluminaerrorcorrectiontoolsondenovogenomeassembly
AT demeesterpiet evaluationoftheimpactofilluminaerrorcorrectiontoolsondenovogenomeassembly
AT vandepeeryves evaluationoftheimpactofilluminaerrorcorrectiontoolsondenovogenomeassembly
AT fostierjan evaluationoftheimpactofilluminaerrorcorrectiontoolsondenovogenomeassembly