Cargando…

Effects of error-correction of heterozygous next-generation sequencing data

BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumpt...

Descripción completa

Detalles Bibliográficos
Autores principales: Fujimoto, M Stanley, Bodily, Paul M, Okuda, Nozomu, Clement, Mark J, Snell, Quinn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110727/
https://www.ncbi.nlm.nih.gov/pubmed/25077414
http://dx.doi.org/10.1186/1471-2105-15-S7-S3
_version_ 1782328022040313856
author Fujimoto, M Stanley
Bodily, Paul M
Okuda, Nozomu
Clement, Mark J
Snell, Quinn
author_facet Fujimoto, M Stanley
Bodily, Paul M
Okuda, Nozomu
Clement, Mark J
Snell, Quinn
author_sort Fujimoto, M Stanley
collection PubMed
description BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO.
format Online
Article
Text
id pubmed-4110727
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41107272014-08-05 Effects of error-correction of heterozygous next-generation sequencing data Fujimoto, M Stanley Bodily, Paul M Okuda, Nozomu Clement, Mark J Snell, Quinn BMC Bioinformatics Research BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO. BioMed Central 2014-05-28 /pmc/articles/PMC4110727/ /pubmed/25077414 http://dx.doi.org/10.1186/1471-2105-15-S7-S3 Text en Copyright © 2014 Fujimoto et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Fujimoto, M Stanley
Bodily, Paul M
Okuda, Nozomu
Clement, Mark J
Snell, Quinn
Effects of error-correction of heterozygous next-generation sequencing data
title Effects of error-correction of heterozygous next-generation sequencing data
title_full Effects of error-correction of heterozygous next-generation sequencing data
title_fullStr Effects of error-correction of heterozygous next-generation sequencing data
title_full_unstemmed Effects of error-correction of heterozygous next-generation sequencing data
title_short Effects of error-correction of heterozygous next-generation sequencing data
title_sort effects of error-correction of heterozygous next-generation sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110727/
https://www.ncbi.nlm.nih.gov/pubmed/25077414
http://dx.doi.org/10.1186/1471-2105-15-S7-S3
work_keys_str_mv AT fujimotomstanley effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata
AT bodilypaulm effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata
AT okudanozomu effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata
AT clementmarkj effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata
AT snellquinn effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata