Cargando…
Getting DNA copy numbers without control samples
BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/ https://www.ncbi.nlm.nih.gov/pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19 |
_version_ | 1782251743242878976 |
---|---|
author | Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel |
author_facet | Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel |
author_sort | Ortiz-Estevez, Maria |
collection | PubMed |
description | BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons. |
format | Online Article Text |
id | pubmed-3512512 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35125122012-12-04 Getting DNA copy numbers without control samples Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel Algorithms Mol Biol Software Article BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons. BioMed Central 2012-08-16 /pmc/articles/PMC3512512/ /pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19 Text en Copyright ©2012 Ortiz-Estevez et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Article Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel Getting DNA copy numbers without control samples |
title | Getting DNA copy numbers without control samples |
title_full | Getting DNA copy numbers without control samples |
title_fullStr | Getting DNA copy numbers without control samples |
title_full_unstemmed | Getting DNA copy numbers without control samples |
title_short | Getting DNA copy numbers without control samples |
title_sort | getting dna copy numbers without control samples |
topic | Software Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/ https://www.ncbi.nlm.nih.gov/pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19 |
work_keys_str_mv | AT ortizestevezmaria gettingdnacopynumberswithoutcontrolsamples AT aramburuander gettingdnacopynumberswithoutcontrolsamples AT rubioangel gettingdnacopynumberswithoutcontrolsamples |