Cargando…
VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering
BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625715/ https://www.ncbi.nlm.nih.gov/pubmed/26510841 http://dx.doi.org/10.1186/s12864-015-2050-y |
_version_ | 1782398022133481472 |
---|---|
author | Gézsi, András Bolgár, Bence Marx, Péter Sarkozy, Peter Szalai, Csaba Antal, Péter |
author_facet | Gézsi, András Bolgár, Bence Marx, Péter Sarkozy, Peter Szalai, Csaba Antal, Péter |
author_sort | Gézsi, András |
collection | PubMed |
description | BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data. RESULTS: We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision. CONCLUSIONS: VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4625715 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46257152015-10-30 VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering Gézsi, András Bolgár, Bence Marx, Péter Sarkozy, Peter Szalai, Csaba Antal, Péter BMC Genomics Methodology Article BACKGROUND: The low concordance between different variant calling methods still poses a challenge for the wide-spread application of next-generation sequencing in research and clinical practice. A wide range of variant annotations can be used for filtering call sets in order to improve the precision of the variant calls, but the choice of the appropriate filtering thresholds is not straightforward. Variant quality score recalibration provides an alternative solution to hard filtering, but it requires large-scale, genomic data. RESULTS: We evaluated germline variant calling pipelines based on BWA and Bowtie 2 aligners in combination with GATK UnifiedGenotyper, GATK HaplotypeCaller, FreeBayes and SAMtools variant callers, using simulated and real benchmark sequencing data (NA12878 with Illumina Platinum Genomes). We argue that these pipelines are not merely discordant, but they extract complementary useful information. We introduce VariantMetaCaller to test the hypothesis that the automated fusion of measurement related information allows better performance than the recommended hard-filtering settings or recalibration and the fusion of the individual call sets without using annotations. VariantMetaCaller uses Support Vector Machines to combine multiple information sources generated by variant calling pipelines and estimates probabilities of variants. This novel method had significantly higher sensitivity and precision than the individual variant callers in all target region sizes, ranging from a few hundred kilobases to whole exomes. We also demonstrated that VariantMetaCaller supports a quantitative, precision based filtering of variants under wider conditions. Specifically, the computed probabilities of the variants can be used to order the variants, and for a given threshold, probabilities can be used to estimate precision. Precision then can be directly translated to the number of true called variants, or equivalently, to the number of false calls, which allows finding problem-specific balance between sensitivity and precision. CONCLUSIONS: VariantMetaCaller can be applied to small target regions and whole exomes as well, and it can be used in cases of organisms for which highly accurate variant call sets are not yet available, therefore it can be a viable alternative to hard filtering in cases where variant quality score recalibration cannot be used. VariantMetaCaller is freely available at http://bioinformatics.mit.bme.hu/VariantMetaCaller. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2050-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-28 /pmc/articles/PMC4625715/ /pubmed/26510841 http://dx.doi.org/10.1186/s12864-015-2050-y Text en © Gézsi et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Gézsi, András Bolgár, Bence Marx, Péter Sarkozy, Peter Szalai, Csaba Antal, Péter VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title | VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title_full | VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title_fullStr | VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title_full_unstemmed | VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title_short | VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
title_sort | variantmetacaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4625715/ https://www.ncbi.nlm.nih.gov/pubmed/26510841 http://dx.doi.org/10.1186/s12864-015-2050-y |
work_keys_str_mv | AT gezsiandras variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering AT bolgarbence variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering AT marxpeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering AT sarkozypeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering AT szalaicsaba variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering AT antalpeter variantmetacallerautomatedfusionofvariantcallingpipelinesforquantitativeprecisionbasedfiltering |