Cargando…

geck: trio-based comparative benchmarking of variant calls

MOTIVATION: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Stat...

Descripción completa

Detalles Bibliográficos
Autores principales: Kómár, Péter, Kural, Deniz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6184596/
https://www.ncbi.nlm.nih.gov/pubmed/29850774
http://dx.doi.org/10.1093/bioinformatics/bty415
_version_ 1783362731402854400
author Kómár, Péter
Kural, Deniz
author_facet Kómár, Péter
Kural, Deniz
author_sort Kómár, Péter
collection PubMed
description MOTIVATION: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. RESULTS: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with [Formula: see text] uncertainty. AVAILABILITY AND IMPLEMENTATION: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6184596
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61845962018-10-18 geck: trio-based comparative benchmarking of variant calls Kómár, Péter Kural, Deniz Bioinformatics Original Papers MOTIVATION: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. RESULTS: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with [Formula: see text] uncertainty. AVAILABILITY AND IMPLEMENTATION: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-10-15 2018-05-29 /pmc/articles/PMC6184596/ /pubmed/29850774 http://dx.doi.org/10.1093/bioinformatics/bty415 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Kómár, Péter
Kural, Deniz
geck: trio-based comparative benchmarking of variant calls
title geck: trio-based comparative benchmarking of variant calls
title_full geck: trio-based comparative benchmarking of variant calls
title_fullStr geck: trio-based comparative benchmarking of variant calls
title_full_unstemmed geck: trio-based comparative benchmarking of variant calls
title_short geck: trio-based comparative benchmarking of variant calls
title_sort geck: trio-based comparative benchmarking of variant calls
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6184596/
https://www.ncbi.nlm.nih.gov/pubmed/29850774
http://dx.doi.org/10.1093/bioinformatics/bty415
work_keys_str_mv AT komarpeter gecktriobasedcomparativebenchmarkingofvariantcalls
AT kuraldeniz gecktriobasedcomparativebenchmarkingofvariantcalls