Cargando…
Dynamic variable selection in SNP genotype autocalling from APEX microarray data
BACKGROUND: Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundan...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1702553/ https://www.ncbi.nlm.nih.gov/pubmed/17137502 http://dx.doi.org/10.1186/1471-2105-7-521 |
_version_ | 1782131265943633920 |
---|---|
author | Podder, Mohua Welch, William J Zamar, Ruben H Tebbutt, Scott J |
author_facet | Podder, Mohua Welch, William J Zamar, Ruben H Tebbutt, Scott J |
author_sort | Podder, Mohua |
collection | PubMed |
description | BACKGROUND: Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. RESULTS: Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully-automated genotyping algorithm based on simple linear discriminant analysis (LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital (plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. CONCLUSION: The strength of this APEX chemistry-based platform is its unique redundancy having multiple probes for a single SNP. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any 'bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so-called bad performing probes in a sample-specific manner, for any number of SNPs. |
format | Text |
id | pubmed-1702553 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-17025532006-12-19 Dynamic variable selection in SNP genotype autocalling from APEX microarray data Podder, Mohua Welch, William J Zamar, Ruben H Tebbutt, Scott J BMC Bioinformatics Research Article BACKGROUND: Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. RESULTS: Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully-automated genotyping algorithm based on simple linear discriminant analysis (LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital (plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. CONCLUSION: The strength of this APEX chemistry-based platform is its unique redundancy having multiple probes for a single SNP. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any 'bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so-called bad performing probes in a sample-specific manner, for any number of SNPs. BioMed Central 2006-11-30 /pmc/articles/PMC1702553/ /pubmed/17137502 http://dx.doi.org/10.1186/1471-2105-7-521 Text en Copyright © 2006 Podder et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Podder, Mohua Welch, William J Zamar, Ruben H Tebbutt, Scott J Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title | Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title_full | Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title_fullStr | Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title_full_unstemmed | Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title_short | Dynamic variable selection in SNP genotype autocalling from APEX microarray data |
title_sort | dynamic variable selection in snp genotype autocalling from apex microarray data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1702553/ https://www.ncbi.nlm.nih.gov/pubmed/17137502 http://dx.doi.org/10.1186/1471-2105-7-521 |
work_keys_str_mv | AT poddermohua dynamicvariableselectioninsnpgenotypeautocallingfromapexmicroarraydata AT welchwilliamj dynamicvariableselectioninsnpgenotypeautocallingfromapexmicroarraydata AT zamarrubenh dynamicvariableselectioninsnpgenotypeautocallingfromapexmicroarraydata AT tebbuttscottj dynamicvariableselectioninsnpgenotypeautocallingfromapexmicroarraydata |