Cargando…

Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias

BACKGROUND: Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing...

Descripción completa

Detalles Bibliográficos
Autores principales:	McElroy, Kerensa, Zagordi, Osvaldo, Bull, Rowena, Luciani, Fabio, Beerenwinkel, Niko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848937/ https://www.ncbi.nlm.nih.gov/pubmed/23879730 http://dx.doi.org/10.1186/1471-2164-14-501

_version_	1782293850161676288
author	McElroy, Kerensa Zagordi, Osvaldo Bull, Rowena Luciani, Fabio Beerenwinkel, Niko
author_facet	McElroy, Kerensa Zagordi, Osvaldo Bull, Rowena Luciani, Fabio Beerenwinkel, Niko
author_sort	McElroy, Kerensa
collection	PubMed
description	BACKGROUND: Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking. RESULTS: By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall. CONCLUSIONS: By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy.
format	Online Article Text
id	pubmed-3848937
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-38489372013-12-06 Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias McElroy, Kerensa Zagordi, Osvaldo Bull, Rowena Luciani, Fabio Beerenwinkel, Niko BMC Genomics Methodology Article BACKGROUND: Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking. RESULTS: By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall. CONCLUSIONS: By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy. BioMed Central 2013-07-24 /pmc/articles/PMC3848937/ /pubmed/23879730 http://dx.doi.org/10.1186/1471-2164-14-501 Text en Copyright © 2013 McElroy et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article McElroy, Kerensa Zagordi, Osvaldo Bull, Rowena Luciani, Fabio Beerenwinkel, Niko Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title	Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title_full	Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title_fullStr	Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title_full_unstemmed	Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title_short	Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
title_sort	accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848937/ https://www.ncbi.nlm.nih.gov/pubmed/23879730 http://dx.doi.org/10.1186/1471-2164-14-501
work_keys_str_mv	AT mcelroykerensa accuratesinglenucleotidevariantdetectioninviralpopulationsbycombiningprobabilisticclusteringwithastatisticaltestofstrandbias AT zagordiosvaldo accuratesinglenucleotidevariantdetectioninviralpopulationsbycombiningprobabilisticclusteringwithastatisticaltestofstrandbias AT bullrowena accuratesinglenucleotidevariantdetectioninviralpopulationsbycombiningprobabilisticclusteringwithastatisticaltestofstrandbias AT lucianifabio accuratesinglenucleotidevariantdetectioninviralpopulationsbycombiningprobabilisticclusteringwithastatisticaltestofstrandbias AT beerenwinkelniko accuratesinglenucleotidevariantdetectioninviralpopulationsbycombiningprobabilisticclusteringwithastatisticaltestofstrandbias

Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias

Ejemplares similares