Cargando…
Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform
BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Arom...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146450/ https://www.ncbi.nlm.nih.gov/pubmed/21627824 http://dx.doi.org/10.1186/1471-2105-12-220 |
_version_ | 1782209212099592192 |
---|---|
author | Eckel-Passow, Jeanette E Atkinson, Elizabeth J Maharjan, Sooraj Kardia, Sharon LR de Andrade, Mariza |
author_facet | Eckel-Passow, Jeanette E Atkinson, Elizabeth J Maharjan, Sooraj Kardia, Sharon LR de Andrade, Mariza |
author_sort | Eckel-Passow, Jeanette E |
collection | PubMed |
description | BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. RESULTS: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. CONCLUSIONS: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests. |
format | Online Article Text |
id | pubmed-3146450 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31464502011-07-30 Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform Eckel-Passow, Jeanette E Atkinson, Elizabeth J Maharjan, Sooraj Kardia, Sharon LR de Andrade, Mariza BMC Bioinformatics Research Article BACKGROUND: Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. RESULTS: APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. CONCLUSIONS: If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests. BioMed Central 2011-05-31 /pmc/articles/PMC3146450/ /pubmed/21627824 http://dx.doi.org/10.1186/1471-2105-12-220 Text en Copyright ©2011 Eckel-Passow et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Eckel-Passow, Jeanette E Atkinson, Elizabeth J Maharjan, Sooraj Kardia, Sharon LR de Andrade, Mariza Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title | Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title_full | Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title_fullStr | Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title_full_unstemmed | Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title_short | Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform |
title_sort | software comparison for evaluating genomic copy number variation for affymetrix 6.0 snp array platform |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3146450/ https://www.ncbi.nlm.nih.gov/pubmed/21627824 http://dx.doi.org/10.1186/1471-2105-12-220 |
work_keys_str_mv | AT eckelpassowjeanettee softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform AT atkinsonelizabethj softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform AT maharjansooraj softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform AT kardiasharonlr softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform AT deandrademariza softwarecomparisonforevaluatinggenomiccopynumbervariationforaffymetrix60snparrayplatform |