Cargando…
Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies
Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2995073/ https://www.ncbi.nlm.nih.gov/pubmed/20671025 http://dx.doi.org/10.1093/nar/gkq655 |
_version_ | 1782193044285554688 |
---|---|
author | Zagordi, Osvaldo Klein, Rolf Däumer, Martin Beerenwinkel, Niko |
author_facet | Zagordi, Osvaldo Klein, Rolf Däumer, Martin Beerenwinkel, Niko |
author_sort | Zagordi, Osvaldo |
collection | PubMed |
description | Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated. |
format | Text |
id | pubmed-2995073 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-29950732010-12-01 Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies Zagordi, Osvaldo Klein, Rolf Däumer, Martin Beerenwinkel, Niko Nucleic Acids Res Computational Biology Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated. Oxford University Press 2010-11 2010-07-29 /pmc/articles/PMC2995073/ /pubmed/20671025 http://dx.doi.org/10.1093/nar/gkq655 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Zagordi, Osvaldo Klein, Rolf Däumer, Martin Beerenwinkel, Niko Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title | Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title_full | Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title_fullStr | Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title_full_unstemmed | Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title_short | Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies |
title_sort | error correction of next-generation sequencing data and reliable estimation of hiv quasispecies |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2995073/ https://www.ncbi.nlm.nih.gov/pubmed/20671025 http://dx.doi.org/10.1093/nar/gkq655 |
work_keys_str_mv | AT zagordiosvaldo errorcorrectionofnextgenerationsequencingdataandreliableestimationofhivquasispecies AT kleinrolf errorcorrectionofnextgenerationsequencingdataandreliableestimationofhivquasispecies AT daumermartin errorcorrectionofnextgenerationsequencingdataandreliableestimationofhivquasispecies AT beerenwinkelniko errorcorrectionofnextgenerationsequencingdataandreliableestimationofhivquasispecies |