Cargando…

Generalizable characteristics of false-positive bacterial variant calls

Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple pr...

Descripción completa

Detalles Bibliográficos
Autor principal: Bush, Stephen J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549357/
https://www.ncbi.nlm.nih.gov/pubmed/34346861
http://dx.doi.org/10.1099/mgen.0.000615
_version_ 1784590767041282048
author Bush, Stephen J.
author_facet Bush, Stephen J.
author_sort Bush, Stephen J.
collection PubMed
description Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development.
format Online
Article
Text
id pubmed-8549357
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-85493572021-10-27 Generalizable characteristics of false-positive bacterial variant calls Bush, Stephen J. Microb Genom Research Articles Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development. Microbiology Society 2021-08-04 /pmc/articles/PMC8549357/ /pubmed/34346861 http://dx.doi.org/10.1099/mgen.0.000615 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
spellingShingle Research Articles
Bush, Stephen J.
Generalizable characteristics of false-positive bacterial variant calls
title Generalizable characteristics of false-positive bacterial variant calls
title_full Generalizable characteristics of false-positive bacterial variant calls
title_fullStr Generalizable characteristics of false-positive bacterial variant calls
title_full_unstemmed Generalizable characteristics of false-positive bacterial variant calls
title_short Generalizable characteristics of false-positive bacterial variant calls
title_sort generalizable characteristics of false-positive bacterial variant calls
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549357/
https://www.ncbi.nlm.nih.gov/pubmed/34346861
http://dx.doi.org/10.1099/mgen.0.000615
work_keys_str_mv AT bushstephenj generalizablecharacteristicsoffalsepositivebacterialvariantcalls