Cargando…

Getting DNA copy numbers without control samples

BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases,...

Descripción completa

Detalles Bibliográficos
Autores principales: Ortiz-Estevez, Maria, Aramburu, Ander, Rubio, Angel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/
https://www.ncbi.nlm.nih.gov/pubmed/22898240
http://dx.doi.org/10.1186/1748-7188-7-19
_version_ 1782251743242878976
author Ortiz-Estevez, Maria
Aramburu, Ander
Rubio, Angel
author_facet Ortiz-Estevez, Maria
Aramburu, Ander
Rubio, Angel
author_sort Ortiz-Estevez, Maria
collection PubMed
description BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons.
format Online
Article
Text
id pubmed-3512512
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35125122012-12-04 Getting DNA copy numbers without control samples Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel Algorithms Mol Biol Software Article BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons. BioMed Central 2012-08-16 /pmc/articles/PMC3512512/ /pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19 Text en Copyright ©2012 Ortiz-Estevez et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Article
Ortiz-Estevez, Maria
Aramburu, Ander
Rubio, Angel
Getting DNA copy numbers without control samples
title Getting DNA copy numbers without control samples
title_full Getting DNA copy numbers without control samples
title_fullStr Getting DNA copy numbers without control samples
title_full_unstemmed Getting DNA copy numbers without control samples
title_short Getting DNA copy numbers without control samples
title_sort getting dna copy numbers without control samples
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/
https://www.ncbi.nlm.nih.gov/pubmed/22898240
http://dx.doi.org/10.1186/1748-7188-7-19
work_keys_str_mv AT ortizestevezmaria gettingdnacopynumberswithoutcontrolsamples
AT aramburuander gettingdnacopynumberswithoutcontrolsamples
AT rubioangel gettingdnacopynumberswithoutcontrolsamples