Cargando…

Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch

The use of next‐generation sequencing (NGS) data sets has increased dramatically over the last decade, but there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single lodgepol...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jasper, Russ J., McDonald, Tegan Krista, Singh, Pooja, Lu, Mengmeng, Rougeux, Clément, Lind, Brandon M., Yeaman, Sam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	John Wiley and Sons Inc. 2022
Materias:	RESOURCE ARTICLES
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9544674/ https://www.ncbi.nlm.nih.gov/pubmed/35510784 http://dx.doi.org/10.1111/1755-0998.13628

_version_	1784804648929984512
author	Jasper, Russ J. McDonald, Tegan Krista Singh, Pooja Lu, Mengmeng Rougeux, Clément Lind, Brandon M. Yeaman, Sam
author_facet	Jasper, Russ J. McDonald, Tegan Krista Singh, Pooja Lu, Mengmeng Rougeux, Clément Lind, Brandon M. Yeaman, Sam
author_sort	Jasper, Russ J.
collection	PubMed
description	The use of next‐generation sequencing (NGS) data sets has increased dramatically over the last decade, but there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single lodgepole pine (Pinus contorta) parent and the maternally derived haploid tissue from 106 full‐sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the single nucleotide polymorphism (SNP) genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded more SNPs and higher error rates by one to two orders of magnitude, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non‐model study systems.
format	Online Article Text
id	pubmed-9544674
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	John Wiley and Sons Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-95446742022-10-14 Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch Jasper, Russ J. McDonald, Tegan Krista Singh, Pooja Lu, Mengmeng Rougeux, Clément Lind, Brandon M. Yeaman, Sam Mol Ecol Resour RESOURCE ARTICLES The use of next‐generation sequencing (NGS) data sets has increased dramatically over the last decade, but there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single lodgepole pine (Pinus contorta) parent and the maternally derived haploid tissue from 106 full‐sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the single nucleotide polymorphism (SNP) genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded more SNPs and higher error rates by one to two orders of magnitude, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non‐model study systems. John Wiley and Sons Inc. 2022-05-22 2022-10 /pmc/articles/PMC9544674/ /pubmed/35510784 http://dx.doi.org/10.1111/1755-0998.13628 Text en © 2022 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle	RESOURCE ARTICLES Jasper, Russ J. McDonald, Tegan Krista Singh, Pooja Lu, Mengmeng Rougeux, Clément Lind, Brandon M. Yeaman, Sam Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title	Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title_full	Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title_fullStr	Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title_full_unstemmed	Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title_short	Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
title_sort	evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch
topic	RESOURCE ARTICLES
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9544674/ https://www.ncbi.nlm.nih.gov/pubmed/35510784 http://dx.doi.org/10.1111/1755-0998.13628
work_keys_str_mv	AT jasperrussj evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT mcdonaldtegankrista evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT singhpooja evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT lumengmeng evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT rougeuxclement evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT lindbrandonm evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch AT yeamansam evaluatingtheaccuracyofvariantcallingmethodsusingthefrequencyofparentoffspringgenotypemismatch

Evaluating the accuracy of variant calling methods using the frequency of parent‐offspring genotype mismatch

Ejemplares similares