Cargando…

SAQC: SNP Array Quality Control

BACKGROUND: Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses....

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Hsin-Chou, Lin, Hsin-Chi, Kang, Meijyh, Chen, Chun-Houh, Lin, Chien-Wei, Li, Ling-Hui, Wu, Jer-Yuarn, Chen, Yuan-Tsong, Pan, Wen-Harn
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3101186/
https://www.ncbi.nlm.nih.gov/pubmed/21501472
http://dx.doi.org/10.1186/1471-2105-12-100
_version_ 1782204251395588096
author Yang, Hsin-Chou
Lin, Hsin-Chi
Kang, Meijyh
Chen, Chun-Houh
Lin, Chien-Wei
Li, Ling-Hui
Wu, Jer-Yuarn
Chen, Yuan-Tsong
Pan, Wen-Harn
author_facet Yang, Hsin-Chou
Lin, Hsin-Chi
Kang, Meijyh
Chen, Chun-Houh
Lin, Chien-Wei
Li, Ling-Hui
Wu, Jer-Yuarn
Chen, Yuan-Tsong
Pan, Wen-Harn
author_sort Yang, Hsin-Chou
collection PubMed
description BACKGROUND: Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed. RESULTS: We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples. CONCLUSIONS: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm).
format Text
id pubmed-3101186
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31011862011-05-25 SAQC: SNP Array Quality Control Yang, Hsin-Chou Lin, Hsin-Chi Kang, Meijyh Chen, Chun-Houh Lin, Chien-Wei Li, Ling-Hui Wu, Jer-Yuarn Chen, Yuan-Tsong Pan, Wen-Harn BMC Bioinformatics Methodology Article BACKGROUND: Genome-wide single-nucleotide polymorphism (SNP) arrays containing hundreds of thousands of SNPs from the human genome have proven useful for studying important human genome questions. Data quality of SNP arrays plays a key role in the accuracy and precision of downstream data analyses. However, good indices for assessing data quality of SNP arrays have not yet been developed. RESULTS: We developed new quality indices to measure the quality of SNP arrays and/or DNA samples and investigated their statistical properties. The indices quantify a departure of estimated individual-level allele frequencies (AFs) from expected frequencies via standardized distances. The proposed quality indices followed lognormal distributions in several large genomic studies that we empirically evaluated. AF reference data and quality index reference data for different SNP array platforms were established based on samples from various reference populations. Furthermore, a confidence interval method based on the underlying empirical distributions of quality indices was developed to identify poor-quality SNP arrays and/or DNA samples. Analyses of authentic biological data and simulated data show that this new method is sensitive and specific for the detection of poor-quality SNP arrays and/or DNA samples. CONCLUSIONS: This study introduces new quality indices, establishes references for AFs and quality indices, and develops a detection method for poor-quality SNP arrays and/or DNA samples. We have developed a new computer program that utilizes these methods called SNP Array Quality Control (SAQC). SAQC software is written in R and R-GUI and was developed as a user-friendly tool for the visualization and evaluation of data quality of genome-wide SNP arrays. The program is available online (http://www.stat.sinica.edu.tw/hsinchou/genetics/quality/SAQC.htm). BioMed Central 2011-04-18 /pmc/articles/PMC3101186/ /pubmed/21501472 http://dx.doi.org/10.1186/1471-2105-12-100 Text en Copyright ©2011 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Yang, Hsin-Chou
Lin, Hsin-Chi
Kang, Meijyh
Chen, Chun-Houh
Lin, Chien-Wei
Li, Ling-Hui
Wu, Jer-Yuarn
Chen, Yuan-Tsong
Pan, Wen-Harn
SAQC: SNP Array Quality Control
title SAQC: SNP Array Quality Control
title_full SAQC: SNP Array Quality Control
title_fullStr SAQC: SNP Array Quality Control
title_full_unstemmed SAQC: SNP Array Quality Control
title_short SAQC: SNP Array Quality Control
title_sort saqc: snp array quality control
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3101186/
https://www.ncbi.nlm.nih.gov/pubmed/21501472
http://dx.doi.org/10.1186/1471-2105-12-100
work_keys_str_mv AT yanghsinchou saqcsnparrayqualitycontrol
AT linhsinchi saqcsnparrayqualitycontrol
AT kangmeijyh saqcsnparrayqualitycontrol
AT chenchunhouh saqcsnparrayqualitycontrol
AT linchienwei saqcsnparrayqualitycontrol
AT lilinghui saqcsnparrayqualitycontrol
AT wujeryuarn saqcsnparrayqualitycontrol
AT chenyuantsong saqcsnparrayqualitycontrol
AT panwenharn saqcsnparrayqualitycontrol