Cargando…
A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays
Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An app...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577504/ https://www.ncbi.nlm.nih.gov/pubmed/34764980 http://dx.doi.org/10.3389/fgene.2021.736390 |
_version_ | 1784596073068625920 |
---|---|
author | Sun, Ting-Hsuan Shao, Yu-Hsuan Joni Mao, Chien-Lin Hung, Miao-Neng Lo, Yi-Yun Ko, Tai-Ming Hsiao, Tzu-Hung |
author_facet | Sun, Ting-Hsuan Shao, Yu-Hsuan Joni Mao, Chien-Lin Hung, Miao-Neng Lo, Yi-Yun Ko, Tai-Ming Hsiao, Tzu-Hung |
author_sort | Sun, Ting-Hsuan |
collection | PubMed |
description | Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications. Methods: We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure. Results: We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification. Conclusion: Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice. |
format | Online Article Text |
id | pubmed-8577504 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-85775042021-11-10 A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays Sun, Ting-Hsuan Shao, Yu-Hsuan Joni Mao, Chien-Lin Hung, Miao-Neng Lo, Yi-Yun Ko, Tai-Ming Hsiao, Tzu-Hung Front Genet Genetics Background: Single-nucleotide polymorphism (SNP) arrays are an ideal technology for genotyping genetic variants in mass screening. However, using SNP arrays to detect rare variants [with a minor allele frequency (MAF) of <1%] is still a challenge because of noise signals and batch effects. An approach that improves the genotyping quality is needed for clinical applications. Methods: We developed a quality-control procedure for rare variants which integrates different algorithms, filters, and experiments to increase the accuracy of variant calling. Using data from the TWB 2.0 custom Axiom array, we adopted an advanced normalization adjustment to prevent false calls caused by splitting the cluster and a rare het adjustment which decreases false calls in rare variants. The concordance of allelic frequencies from array data was compared to those from sequencing datasets of Taiwanese. Finally, genotyping results were used to detect familial hypercholesterolemia (FH), thrombophilia (TH), and maturity-onset diabetes of the young (MODY) to assess the performance in disease screening. All heterozygous calls were verified by Sanger sequencing or qPCR. The positive predictive value (PPV) of each step was estimated to evaluate the performance of our procedure. Results: We analyzed SNP array data from 43,433 individuals, which interrogated 267,247 rare variants. The advanced normalization and rare het adjustment methods adjusted genotyping calling of 168,134 variants (96.49%). We further removed 3916 probesets which were discordant in MAFs between the SNP array and sequencing data. The PPV for detecting pathogenic variants with 0.01%<MAF≤1% exceeded 99.37%. PPVs for those with an MAF of ≤0.01% improved from 95% to 100% for FH, 42.11% to 85.19% for TH, and 18.24% to 72.22% for MODY after adopting our rare variant quality-control procedure and experimental verification. Conclusion: Adopting our quality-control procedure, SNP arrays can adequately detect variants with MAF values ranging 0.01%∼0.1%. For variants with MAF values of ≤0.01%, experimental validation is needed unless sequencing data from a homogeneous population of >10,000 are available. The results demonstrated our procedure could perform correct genotype calling of rare variants. It provides a solution of pathogenic variant detection through SNP array. The approach brings tremendous promise for implementing precision medicine in medical practice. Frontiers Media S.A. 2021-10-26 /pmc/articles/PMC8577504/ /pubmed/34764980 http://dx.doi.org/10.3389/fgene.2021.736390 Text en Copyright © 2021 Sun, Shao, Mao, Hung, Lo, Ko and Hsiao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Sun, Ting-Hsuan Shao, Yu-Hsuan Joni Mao, Chien-Lin Hung, Miao-Neng Lo, Yi-Yun Ko, Tai-Ming Hsiao, Tzu-Hung A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title | A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title_full | A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title_fullStr | A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title_full_unstemmed | A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title_short | A Novel Quality-Control Procedure to Improve the Accuracy of Rare Variant Calling in SNP Arrays |
title_sort | novel quality-control procedure to improve the accuracy of rare variant calling in snp arrays |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577504/ https://www.ncbi.nlm.nih.gov/pubmed/34764980 http://dx.doi.org/10.3389/fgene.2021.736390 |
work_keys_str_mv | AT suntinghsuan anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT shaoyuhsuanjoni anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT maochienlin anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT hungmiaoneng anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT loyiyun anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT kotaiming anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT hsiaotzuhung anovelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT suntinghsuan novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT shaoyuhsuanjoni novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT maochienlin novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT hungmiaoneng novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT loyiyun novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT kotaiming novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays AT hsiaotzuhung novelqualitycontrolproceduretoimprovetheaccuracyofrarevariantcallinginsnparrays |