Cargando…

ACPA: automated cluster plot analysis of genotype data

Genome-wide association studies have become standard in genetic epidemiology. Analyzing hundreds of thousands of markers simultaneously imposes some challenges for statisticians. One issue is the problem of multiplicity, which has been compared with the search for the needle in a haystack. To reduce...

Descripción completa

Detalles Bibliográficos
Autores principales: Schillert, Arne, Schwarz, Daniel F, Vens, Maren, Szymczak, Silke, König, Inke R, Ziegler, Andreas
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795958/
https://www.ncbi.nlm.nih.gov/pubmed/20018051
_version_ 1782175479520821248
author Schillert, Arne
Schwarz, Daniel F
Vens, Maren
Szymczak, Silke
König, Inke R
Ziegler, Andreas
author_facet Schillert, Arne
Schwarz, Daniel F
Vens, Maren
Szymczak, Silke
König, Inke R
Ziegler, Andreas
author_sort Schillert, Arne
collection PubMed
description Genome-wide association studies have become standard in genetic epidemiology. Analyzing hundreds of thousands of markers simultaneously imposes some challenges for statisticians. One issue is the problem of multiplicity, which has been compared with the search for the needle in a haystack. To reduce the number of false-positive findings, a number of quality filters such as exclusion of single-nucleotide polymorphisms (SNPs) with a high missing fraction are employed. Another filter is exclusion of SNPs for which the calling algorithm had difficulties in assigning the genotypes. The only way to do this is the visual inspection of the cluster plots, also termed signal intensity plots, but this approach is often neglected. We developed an algorithm ACPA (automated cluster plot analysis), which performs this task automatically for autosomal SNPs. It is based on counting samples that lie too close to the cluster of a different genotype; SNPs are excluded when a certain threshold is exceeded. We evaluated ACPA using 1,000 randomly selected quality controlled SNPs from the Framingham Heart Study data that were provided for the Genetic Analysis Workshop 16. We compared the decision of ACPA with the decision made by two independent readers. We achieved a sensitivity of 88% (95% CI: 81%-93%) and a specificity of 86% (95% CI: 83%-89%). In a screening setting in which one aims at not losing any good SNP, we achieved 99% (95% CI: 98%-100%) specificity and still detected every second low-quality SNP.
format Text
id pubmed-2795958
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27959582009-12-18 ACPA: automated cluster plot analysis of genotype data Schillert, Arne Schwarz, Daniel F Vens, Maren Szymczak, Silke König, Inke R Ziegler, Andreas BMC Proc Proceedings Genome-wide association studies have become standard in genetic epidemiology. Analyzing hundreds of thousands of markers simultaneously imposes some challenges for statisticians. One issue is the problem of multiplicity, which has been compared with the search for the needle in a haystack. To reduce the number of false-positive findings, a number of quality filters such as exclusion of single-nucleotide polymorphisms (SNPs) with a high missing fraction are employed. Another filter is exclusion of SNPs for which the calling algorithm had difficulties in assigning the genotypes. The only way to do this is the visual inspection of the cluster plots, also termed signal intensity plots, but this approach is often neglected. We developed an algorithm ACPA (automated cluster plot analysis), which performs this task automatically for autosomal SNPs. It is based on counting samples that lie too close to the cluster of a different genotype; SNPs are excluded when a certain threshold is exceeded. We evaluated ACPA using 1,000 randomly selected quality controlled SNPs from the Framingham Heart Study data that were provided for the Genetic Analysis Workshop 16. We compared the decision of ACPA with the decision made by two independent readers. We achieved a sensitivity of 88% (95% CI: 81%-93%) and a specificity of 86% (95% CI: 83%-89%). In a screening setting in which one aims at not losing any good SNP, we achieved 99% (95% CI: 98%-100%) specificity and still detected every second low-quality SNP. BioMed Central 2009-12-15 /pmc/articles/PMC2795958/ /pubmed/20018051 Text en Copyright ©2009 Schillert et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Schillert, Arne
Schwarz, Daniel F
Vens, Maren
Szymczak, Silke
König, Inke R
Ziegler, Andreas
ACPA: automated cluster plot analysis of genotype data
title ACPA: automated cluster plot analysis of genotype data
title_full ACPA: automated cluster plot analysis of genotype data
title_fullStr ACPA: automated cluster plot analysis of genotype data
title_full_unstemmed ACPA: automated cluster plot analysis of genotype data
title_short ACPA: automated cluster plot analysis of genotype data
title_sort acpa: automated cluster plot analysis of genotype data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2795958/
https://www.ncbi.nlm.nih.gov/pubmed/20018051
work_keys_str_mv AT schillertarne acpaautomatedclusterplotanalysisofgenotypedata
AT schwarzdanielf acpaautomatedclusterplotanalysisofgenotypedata
AT vensmaren acpaautomatedclusterplotanalysisofgenotypedata
AT szymczaksilke acpaautomatedclusterplotanalysisofgenotypedata
AT koniginker acpaautomatedclusterplotanalysisofgenotypedata
AT zieglerandreas acpaautomatedclusterplotanalysisofgenotypedata