Cargando…

VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering

BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision...

Descripción completa

Detalles Bibliográficos
Autores principales: Gézsi, András, Bolgár, Bence, Marx, Péter, Sarkozy, Peter, Szalai, Csaba, Antal, Péter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625715/
https://www.ncbi.nlm.nih.gov/pubmed/26510841
http://dx.doi.org/10.1186/s12864-015-2050-y
_version_ 1782398022133481472
author Gézsi, András
Bolgár, Bence
Marx, Péter
Sarkozy, Peter
Szalai, Csaba
Antal, Péter
author_facet Gézsi, András
Bolgár, Bence
Marx, Péter
Sarkozy, Peter
Szalai, Csaba
Antal, Péter
author_sort Gézsi, András
collection PubMed
description BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data. RESULTS: We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision. CONCLUSIONS: VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4625715
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46257152015-10-30 VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering Gézsi, András Bolgár, Bence Marx, Péter Sarkozy, Peter Szalai, Csaba Antal, Péter BMC Genomics Methodology Article BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data. RESULTS: We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision. CONCLUSIONS: VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-28 /pmc/articles/PMC4625715/ /pubmed/26510841 http://dx.doi.org/10.1186/s12864-015-2050-y Text en © Gézsi et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Gézsi, András
Bolgár, Bence
Marx, Péter
Sarkozy, Peter
Szalai, Csaba
Antal, Péter
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title_full VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title_fullStr VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title_full_unstemmed VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title_short VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
title_sort variantmetacaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625715/
https://www.ncbi.nlm.nih.gov/pubmed/26510841
http://dx.doi.org/10.1186/s12864-015-2050-y
work_keys_str_mv AT gezsiandras variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering
AT bolgarbence variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering
AT marxpeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering
AT sarkozypeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering
AT szalaicsaba variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering
AT antalpeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering