Cargando…
Effects of error-correction of heterozygous next-generation sequencing data
BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumpt...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110727/ https://www.ncbi.nlm.nih.gov/pubmed/25077414 http://dx.doi.org/10.1186/1471-2105-15-S7-S3 |
_version_ | 1782328022040313856 |
---|---|
author | Fujimoto, M Stanley Bodily, Paul M Okuda, Nozomu Clement, Mark J Snell, Quinn |
author_facet | Fujimoto, M Stanley Bodily, Paul M Okuda, Nozomu Clement, Mark J Snell, Quinn |
author_sort | Fujimoto, M Stanley |
collection | PubMed |
description | BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO. |
format | Online Article Text |
id | pubmed-4110727 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41107272014-08-05 Effects of error-correction of heterozygous next-generation sequencing data Fujimoto, M Stanley Bodily, Paul M Okuda, Nozomu Clement, Mark J Snell, Quinn BMC Bioinformatics Research BACKGROUND: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid. In this survey, two different error correction packages, Quake and ECHO, are examined to see how they perform on next-generation sequence data from heterozygous genomes. RESULTS: Quake and ECHO perform well and were able to correct many errors found within the data. However, errors that occur at heterozygous positions had unique trends. Errors at these positions were sometimes corrected incorrectly, introducing errors into the dataset with the possibility of creating a chimeric read. Quake was much less likely to create chimeric reads. Quake's read trimming removed a large portion of the original data and often left reads with few heterozygous markers. ECHO resulted in more chimeric reads and introduced more errors than Quake but preserved heterozygous markers. Using real E. coli sequencing data and their assemblies after error correction, the assembly statistics improved. It was also found that segregating reads by haplotype can improve the quality of an assembly. CONCLUSIONS: These findings suggest that Quake and ECHO both have strengths and weaknesses when applied to heterozygous data. With the increased interest in haplotype specific analysis, new tools that are designed to be haplotype-aware are necessary that do not have the weaknesses of Quake and ECHO. BioMed Central 2014-05-28 /pmc/articles/PMC4110727/ /pubmed/25077414 http://dx.doi.org/10.1186/1471-2105-15-S7-S3 Text en Copyright © 2014 Fujimoto et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Fujimoto, M Stanley Bodily, Paul M Okuda, Nozomu Clement, Mark J Snell, Quinn Effects of error-correction of heterozygous next-generation sequencing data |
title | Effects of error-correction of heterozygous next-generation sequencing data |
title_full | Effects of error-correction of heterozygous next-generation sequencing data |
title_fullStr | Effects of error-correction of heterozygous next-generation sequencing data |
title_full_unstemmed | Effects of error-correction of heterozygous next-generation sequencing data |
title_short | Effects of error-correction of heterozygous next-generation sequencing data |
title_sort | effects of error-correction of heterozygous next-generation sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4110727/ https://www.ncbi.nlm.nih.gov/pubmed/25077414 http://dx.doi.org/10.1186/1471-2105-15-S7-S3 |
work_keys_str_mv | AT fujimotomstanley effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata AT bodilypaulm effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata AT okudanozomu effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata AT clementmarkj effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata AT snellquinn effectsoferrorcorrectionofheterozygousnextgenerationsequencingdata |