Cargando…
Silhouette scores for assessment of SNP genotype clusters
BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the gen...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555759/ https://www.ncbi.nlm.nih.gov/pubmed/15760469 http://dx.doi.org/10.1186/1471-2164-6-35 |
_version_ | 1782122557350084608 |
---|---|
author | Lovmar, Lovisa Ahlford, Annika Jonsson, Mats Syvänen, Ann-Christine |
author_facet | Lovmar, Lovisa Ahlford, Annika Jonsson, Mats Syvänen, Ann-Christine |
author_sort | Lovmar, Lovisa |
collection | PubMed |
description | BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances. RESULTS: We created a program (ClusterA) for calculating "Silhouette scores", and applied it to assess the quality of SNP genotype clusters obtained by single nucleotide primer extension ("minisequencing") in the Tag-microarray format. A Silhouette score condenses the quality of the genotype assignment for each SNP assay into a single numeric value, which ranges from 1.0, when the genotype assignment is unequivocal, down to -1.0, when the genotype assignment has been arbitrary. In the present study we applied Silhouette scores to compare the performance of four DNA polymerases in our minisequencing system by analyzing 26 SNPs in both DNA polarities in 16 DNA samples. We found Silhouettes to provide a relevant measure for the quality of SNP assays at different reaction conditions, illustrated by the four DNA polymerases here. According to our result, the genotypes can be unequivocally assigned without manual inspection when the Silhouette score for a SNP assay is > 0.65. All four DNA polymerases performed satisfactorily in our Tag-array minisequencing system. CONCLUSION: "Silhouette scores" for assessing the quality of SNP genotyping clusters is convenient for evaluating the quality of SNP genotype assignment, and provides an objective, numeric measure for comparing the performance of SNP assays. The program we created for calculating Silhouette scores is freely available, and can be used for quality assessment of the results from all genotyping systems, where the genotypes are assigned by cluster analysis using scatter plots. |
format | Text |
id | pubmed-555759 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-5557592005-04-01 Silhouette scores for assessment of SNP genotype clusters Lovmar, Lovisa Ahlford, Annika Jonsson, Mats Syvänen, Ann-Christine BMC Genomics Methodology Article BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances. RESULTS: We created a program (ClusterA) for calculating "Silhouette scores", and applied it to assess the quality of SNP genotype clusters obtained by single nucleotide primer extension ("minisequencing") in the Tag-microarray format. A Silhouette score condenses the quality of the genotype assignment for each SNP assay into a single numeric value, which ranges from 1.0, when the genotype assignment is unequivocal, down to -1.0, when the genotype assignment has been arbitrary. In the present study we applied Silhouette scores to compare the performance of four DNA polymerases in our minisequencing system by analyzing 26 SNPs in both DNA polarities in 16 DNA samples. We found Silhouettes to provide a relevant measure for the quality of SNP assays at different reaction conditions, illustrated by the four DNA polymerases here. According to our result, the genotypes can be unequivocally assigned without manual inspection when the Silhouette score for a SNP assay is > 0.65. All four DNA polymerases performed satisfactorily in our Tag-array minisequencing system. CONCLUSION: "Silhouette scores" for assessing the quality of SNP genotyping clusters is convenient for evaluating the quality of SNP genotype assignment, and provides an objective, numeric measure for comparing the performance of SNP assays. The program we created for calculating Silhouette scores is freely available, and can be used for quality assessment of the results from all genotyping systems, where the genotypes are assigned by cluster analysis using scatter plots. BioMed Central 2005-03-10 /pmc/articles/PMC555759/ /pubmed/15760469 http://dx.doi.org/10.1186/1471-2164-6-35 Text en Copyright © 2005 Lovmar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Lovmar, Lovisa Ahlford, Annika Jonsson, Mats Syvänen, Ann-Christine Silhouette scores for assessment of SNP genotype clusters |
title | Silhouette scores for assessment of SNP genotype clusters |
title_full | Silhouette scores for assessment of SNP genotype clusters |
title_fullStr | Silhouette scores for assessment of SNP genotype clusters |
title_full_unstemmed | Silhouette scores for assessment of SNP genotype clusters |
title_short | Silhouette scores for assessment of SNP genotype clusters |
title_sort | silhouette scores for assessment of snp genotype clusters |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555759/ https://www.ncbi.nlm.nih.gov/pubmed/15760469 http://dx.doi.org/10.1186/1471-2164-6-35 |
work_keys_str_mv | AT lovmarlovisa silhouettescoresforassessmentofsnpgenotypeclusters AT ahlfordannika silhouettescoresforassessmentofsnpgenotypeclusters AT jonssonmats silhouettescoresforassessmentofsnpgenotypeclusters AT syvanenannchristine silhouettescoresforassessmentofsnpgenotypeclusters |