Cargando…

Evaluating the performance of tools used to call minority variants from whole genome short-read data

Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Said Mohammed, Khadija, Kibinge, Nelson, Prins, Pjotr, Agoti, Charles N., Cotten, Matthew, Nokes, D.J., Brand, Samuel, Githinji, George
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6234735/
https://www.ncbi.nlm.nih.gov/pubmed/30483597
http://dx.doi.org/10.12688/wellcomeopenres.13538.2
_version_ 1783370765204193280
author Said Mohammed, Khadija
Kibinge, Nelson
Prins, Pjotr
Agoti, Charles N.
Cotten, Matthew
Nokes, D.J.
Brand, Samuel
Githinji, George
author_facet Said Mohammed, Khadija
Kibinge, Nelson
Prins, Pjotr
Agoti, Charles N.
Cotten, Matthew
Nokes, D.J.
Brand, Samuel
Githinji, George
author_sort Said Mohammed, Khadija
collection PubMed
description Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.
format Online
Article
Text
id pubmed-6234735
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-62347352018-11-26 Evaluating the performance of tools used to call minority variants from whole genome short-read data Said Mohammed, Khadija Kibinge, Nelson Prins, Pjotr Agoti, Charles N. Cotten, Matthew Nokes, D.J. Brand, Samuel Githinji, George Wellcome Open Res Research Article Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants. F1000 Research Limited 2018-09-13 /pmc/articles/PMC6234735/ /pubmed/30483597 http://dx.doi.org/10.12688/wellcomeopenres.13538.2 Text en Copyright: © 2018 Said Mohammed K et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Said Mohammed, Khadija
Kibinge, Nelson
Prins, Pjotr
Agoti, Charles N.
Cotten, Matthew
Nokes, D.J.
Brand, Samuel
Githinji, George
Evaluating the performance of tools used to call minority variants from whole genome short-read data
title Evaluating the performance of tools used to call minority variants from whole genome short-read data
title_full Evaluating the performance of tools used to call minority variants from whole genome short-read data
title_fullStr Evaluating the performance of tools used to call minority variants from whole genome short-read data
title_full_unstemmed Evaluating the performance of tools used to call minority variants from whole genome short-read data
title_short Evaluating the performance of tools used to call minority variants from whole genome short-read data
title_sort evaluating the performance of tools used to call minority variants from whole genome short-read data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6234735/
https://www.ncbi.nlm.nih.gov/pubmed/30483597
http://dx.doi.org/10.12688/wellcomeopenres.13538.2
work_keys_str_mv AT saidmohammedkhadija evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT kibingenelson evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT prinspjotr evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT agoticharlesn evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT cottenmatthew evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT nokesdj evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT brandsamuel evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata
AT githinjigeorge evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddata