Cargando…

A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays

Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An app...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Ting-Hsuan, Shao, Yu-Hsuan Joni, Mao, Chien-Lin, Hung, Miao-Neng, Lo, Yi-Yun, Ko, Tai-Ming, Hsiao, Tzu-Hung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577504/
https://www.ncbi.nlm.nih.gov/pubmed/34764980
http://dx.doi.org/10.3389/fgene.2021.736390
_version_ 1784596073068625920
author Sun, Ting-Hsuan
Shao, Yu-Hsuan Joni
Mao, Chien-Lin
Hung, Miao-Neng
Lo, Yi-Yun
Ko, Tai-Ming
Hsiao, Tzu-Hung
author_facet Sun, Ting-Hsuan
Shao, Yu-Hsuan Joni
Mao, Chien-Lin
Hung, Miao-Neng
Lo, Yi-Yun
Ko, Tai-Ming
Hsiao, Tzu-Hung
author_sort Sun, Ting-Hsuan
collection PubMed
description Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications. Methods: We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure. Results: We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification. Conclusion: Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice.
format Online
Article
Text
id pubmed-8577504
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85775042021-11-10 A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays Sun, Ting-Hsuan Shao, Yu-Hsuan Joni Mao, Chien-Lin Hung, Miao-Neng Lo, Yi-Yun Ko, Tai-Ming Hsiao, Tzu-Hung Front Genet Genetics Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications. Methods: We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure. Results: We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification. Conclusion: Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice. Frontiers Media S.A. 2021-10-26 /pmc/articles/PMC8577504/ /pubmed/34764980 http://dx.doi.org/10.3389/fgene.2021.736390 Text en Copyright © 2021 Sun, Shao, Mao, Hung, Lo, Ko and Hsiao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Sun, Ting-Hsuan
Shao, Yu-Hsuan Joni
Mao, Chien-Lin
Hung, Miao-Neng
Lo, Yi-Yun
Ko, Tai-Ming
Hsiao, Tzu-Hung
A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title_full A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title_fullStr A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title_full_unstemmed A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title_short A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
title_sort novel quality-control procedure to improve the accuracy of rare variant calling in snp arrays
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577504/
https://www.ncbi.nlm.nih.gov/pubmed/34764980
http://dx.doi.org/10.3389/fgene.2021.736390
work_keys_str_mv AT suntinghsuan anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT shaoyuhsuanjoni anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT maochienlin anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT hungmiaoneng anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT loyiyun anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT kotaiming anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT hsiaotzuhung anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT suntinghsuan novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT shaoyuhsuanjoni novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT maochienlin novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT hungmiaoneng novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT loyiyun novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT kotaiming novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays
AT hsiaotzuhung novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays