Cargando…

Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform

BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Arom...

Descripción completa

Detalles Bibliográficos
Autores principales: Eckel-Passow, Jeanette E, Atkinson, Elizabeth J, Maharjan, Sooraj, Kardia, Sharon LR, de Andrade, Mariza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146450/
https://www.ncbi.nlm.nih.gov/pubmed/21627824
http://dx.doi.org/10.1186/1471-2105-12-220
_version_ 1782209212099592192
author Eckel-Passow, Jeanette E
Atkinson, Elizabeth J
Maharjan, Sooraj
Kardia, Sharon LR
de Andrade, Mariza
author_facet Eckel-Passow, Jeanette E
Atkinson, Elizabeth J
Maharjan, Sooraj
Kardia, Sharon LR
de Andrade, Mariza
author_sort Eckel-Passow, Jeanette E
collection PubMed
description BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. RESULTS: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. CONCLUSIONS: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.
format Online
Article
Text
id pubmed-3146450
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31464502011-07-30 Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform Eckel-Passow, Jeanette E Atkinson, Elizabeth J Maharjan, Sooraj Kardia, Sharon LR de Andrade, Mariza BMC Bioinformatics Research Article BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. RESULTS: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. CONCLUSIONS: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests. BioMed Central 2011-05-31 /pmc/articles/PMC3146450/ /pubmed/21627824 http://dx.doi.org/10.1186/1471-2105-12-220 Text en Copyright ©2011 Eckel-Passow et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Eckel-Passow, Jeanette E
Atkinson, Elizabeth J
Maharjan, Sooraj
Kardia, Sharon LR
de Andrade, Mariza
Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title_full Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title_fullStr Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title_full_unstemmed Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title_short Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
title_sort software comparison for evaluating genomic copy number variation for affymetrix 6.0 snp array platform
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146450/
https://www.ncbi.nlm.nih.gov/pubmed/21627824
http://dx.doi.org/10.1186/1471-2105-12-220
work_keys_str_mv AT eckelpassowjeanettee softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform
AT atkinsonelizabethj softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform
AT maharjansooraj softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform
AT kardiasharonlr softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform
AT deandrademariza softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform