Cargando…

Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness

BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Jayasundara, Duleepa, Saeed, I, Chang, BC, Tang, Sen-Lin, Halgamuge, Saman K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682401/
https://www.ncbi.nlm.nih.gov/pubmed/26678073
http://dx.doi.org/10.1186/1471-2105-16-S18-S3
_version_ 1782405883025686528
author Jayasundara, Duleepa
Saeed, I
Chang, BC
Tang, Sen-Lin
Halgamuge, Saman K
author_facet Jayasundara, Duleepa
Saeed, I
Chang, BC
Tang, Sen-Lin
Halgamuge, Saman K
author_sort Jayasundara, Duleepa
collection PubMed
description BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. RESULTS: On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS: The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY: http://sourceforge.net/projects/viquas/
format Online
Article
Text
id pubmed-4682401
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46824012015-12-21 Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness Jayasundara, Duleepa Saeed, I Chang, BC Tang, Sen-Lin Halgamuge, Saman K BMC Bioinformatics Research BACKGROUND: Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. The lack of knowledge on the number of different strains in a quasispecies population is observed to hinder the precision of existing Viral Quasispecies Spectrum Reconstruction (QSR) methods due to the uncontrolled reconstruction of a large number of in silico false positives. In this work, we formulated a novel probabilistic method for strain richness estimation specifically targeting viral quasispecies. By using this approach we improved our recently proposed spectrum reconstruction pipeline ViQuaS to achieve higher levels of precision in reconstructed quasispecies spectra without compromising the recall rates. We also discuss how one other existing popular QSR method named ShoRAH can be improved using this new approach. RESULTS: On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS: The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY: http://sourceforge.net/projects/viquas/ BioMed Central 2015-12-09 /pmc/articles/PMC4682401/ /pubmed/26678073 http://dx.doi.org/10.1186/1471-2105-16-S18-S3 Text en Copyright © 2015 Jayasundara et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Jayasundara, Duleepa
Saeed, I
Chang, BC
Tang, Sen-Lin
Halgamuge, Saman K
Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title_full Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title_fullStr Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title_full_unstemmed Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title_short Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
title_sort accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682401/
https://www.ncbi.nlm.nih.gov/pubmed/26678073
http://dx.doi.org/10.1186/1471-2105-16-S18-S3
work_keys_str_mv AT jayasundaraduleepa accuratereconstructionofviralquasispeciesspectrathroughimprovedestimationofstrainrichness
AT saeedi accuratereconstructionofviralquasispeciesspectrathroughimprovedestimationofstrainrichness
AT changbc accuratereconstructionofviralquasispeciesspectrathroughimprovedestimationofstrainrichness
AT tangsenlin accuratereconstructionofviralquasispeciesspectrathroughimprovedestimationofstrainrichness
AT halgamugesamank accuratereconstructionofviralquasispeciesspectrathroughimprovedestimationofstrainrichness