Cargando…

Silhouette scores for assessment of SNP genotype clusters

BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Lovmar, Lovisa, Ahlford, Annika, Jonsson, Mats, Syvänen, Ann-Christine
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555759/
https://www.ncbi.nlm.nih.gov/pubmed/15760469
http://dx.doi.org/10.1186/1471-2164-6-35
_version_ 1782122557350084608
author Lovmar, Lovisa
Ahlford, Annika
Jonsson, Mats
Syvänen, Ann-Christine
author_facet Lovmar, Lovisa
Ahlford, Annika
Jonsson, Mats
Syvänen, Ann-Christine
author_sort Lovmar, Lovisa
collection PubMed
description BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances. RESULTS: We created a program (ClusterA) for calculating "Silhouette scores", and applied it to assess the quality of SNP genotype clusters obtained by single nucleotide primer extension ("minisequencing") in the Tag-microarray format. A Silhouette score condenses the quality of the genotype assignment for each SNP assay into a single numeric value, which ranges from 1.0, when the genotype assignment is unequivocal, down to -1.0, when the genotype assignment has been arbitrary. In the present study we applied Silhouette scores to compare the performance of four DNA polymerases in our minisequencing system by analyzing 26 SNPs in both DNA polarities in 16 DNA samples. We found Silhouettes to provide a relevant measure for the quality of SNP assays at different reaction conditions, illustrated by the four DNA polymerases here. According to our result, the genotypes can be unequivocally assigned without manual inspection when the Silhouette score for a SNP assay is > 0.65. All four DNA polymerases performed satisfactorily in our Tag-array minisequencing system. CONCLUSION: "Silhouette scores" for assessing the quality of SNP genotyping clusters is convenient for evaluating the quality of SNP genotype assignment, and provides an objective, numeric measure for comparing the performance of SNP assays. The program we created for calculating Silhouette scores is freely available, and can be used for quality assessment of the results from all genotyping systems, where the genotypes are assigned by cluster analysis using scatter plots.
format Text
id pubmed-555759
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5557592005-04-01 Silhouette scores for assessment of SNP genotype clusters Lovmar, Lovisa Ahlford, Annika Jonsson, Mats Syvänen, Ann-Christine BMC Genomics Methodology Article BACKGROUND: High-throughput genotyping of single nucleotide polymorphisms (SNPs) generates large amounts of data. In many SNP genotyping assays, the genotype assignment is based on scatter plots of signals corresponding to the two SNP alleles. In a robust assay the three clusters that define the genotypes are well separated and the distances between the data points within a cluster are short. "Silhouettes" is a graphical aid for interpretation and validation of data clusters that provides a measure of how well a data point was classified when it was assigned to a cluster. Thus "Silhouettes" can potentially be used as a quality measure for SNP genotyping results and for objective comparison of the performance of SNP assays at different circumstances. RESULTS: We created a program (ClusterA) for calculating "Silhouette scores", and applied it to assess the quality of SNP genotype clusters obtained by single nucleotide primer extension ("minisequencing") in the Tag-microarray format. A Silhouette score condenses the quality of the genotype assignment for each SNP assay into a single numeric value, which ranges from 1.0, when the genotype assignment is unequivocal, down to -1.0, when the genotype assignment has been arbitrary. In the present study we applied Silhouette scores to compare the performance of four DNA polymerases in our minisequencing system by analyzing 26 SNPs in both DNA polarities in 16 DNA samples. We found Silhouettes to provide a relevant measure for the quality of SNP assays at different reaction conditions, illustrated by the four DNA polymerases here. According to our result, the genotypes can be unequivocally assigned without manual inspection when the Silhouette score for a SNP assay is > 0.65. All four DNA polymerases performed satisfactorily in our Tag-array minisequencing system. CONCLUSION: "Silhouette scores" for assessing the quality of SNP genotyping clusters is convenient for evaluating the quality of SNP genotype assignment, and provides an objective, numeric measure for comparing the performance of SNP assays. The program we created for calculating Silhouette scores is freely available, and can be used for quality assessment of the results from all genotyping systems, where the genotypes are assigned by cluster analysis using scatter plots. BioMed Central 2005-03-10 /pmc/articles/PMC555759/ /pubmed/15760469 http://dx.doi.org/10.1186/1471-2164-6-35 Text en Copyright © 2005 Lovmar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Lovmar, Lovisa
Ahlford, Annika
Jonsson, Mats
Syvänen, Ann-Christine
Silhouette scores for assessment of SNP genotype clusters
title Silhouette scores for assessment of SNP genotype clusters
title_full Silhouette scores for assessment of SNP genotype clusters
title_fullStr Silhouette scores for assessment of SNP genotype clusters
title_full_unstemmed Silhouette scores for assessment of SNP genotype clusters
title_short Silhouette scores for assessment of SNP genotype clusters
title_sort silhouette scores for assessment of snp genotype clusters
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC555759/
https://www.ncbi.nlm.nih.gov/pubmed/15760469
http://dx.doi.org/10.1186/1471-2164-6-35
work_keys_str_mv AT lovmarlovisa silhouettescoresforassessmentofsnpgenotypeclusters
AT ahlfordannika silhouettescoresforassessmentofsnpgenotypeclusters
AT jonssonmats silhouettescoresforassessmentofsnpgenotypeclusters
AT syvanenannchristine silhouettescoresforassessmentofsnpgenotypeclusters