Cargando…

Getting DNA copy numbers without control samples

BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ortiz-Estevez, Maria, Aramburu, Ander, Rubio, Angel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Software Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/ https://www.ncbi.nlm.nih.gov/pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19

_version_	1782251743242878976
author	Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel
author_facet	Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel
author_sort	Ortiz-Estevez, Maria
collection	PubMed
description	BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons.
format	Online Article Text
id	pubmed-3512512
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35125122012-12-04 Getting DNA copy numbers without control samples Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel Algorithms Mol Biol Software Article BACKGROUND: The selection of the reference to scale the data in a copy number analysis has paramount importance to achieve accurate estimates. Usually this reference is generated using control samples included in the study. However, these control samples are not always available and in these cases, an artificial reference must be created. A proper generation of this signal is crucial in terms of both noise and bias. We propose NSA (Normality Search Algorithm), a scaling method that works with and without control samples. It is based on the assumption that genomic regions enriched in SNPs with identical copy numbers in both alleles are likely to be normal. These normal regions are predicted for each sample individually and used to calculate the final reference signal. NSA can be applied to any CN data regardless the microarray technology and preprocessing method. It also finds an optimal weighting of the samples minimizing possible batch effects. RESULTS: Five human datasets (a subset of HapMap samples, Glioblastoma Multiforme (GBM), Ovarian, Prostate and Lung Cancer experiments) have been analyzed. It is shown that using only tumoral samples, NSA is able to remove the bias in the copy number estimation, to reduce the noise and therefore, to increase the ability to detect copy number aberrations (CNAs). These improvements allow NSA to also detect recurrent aberrations more accurately than other state of the art methods. CONCLUSIONS: NSA provides a robust and accurate reference for scaling probe signals data to CN values without the need of control samples. It minimizes the problems of bias, noise and batch effects in the estimation of CNs. Therefore, NSA scaling approach helps to better detect recurrent CNAs than current methods. The automatic selection of references makes it useful to perform bulk analysis of many GEO or ArrayExpress experiments without the need of developing a parser to find the normal samples or possible batches within the data. The method is available in the open-source R package NSA, which is an add-on to the aroma.cn framework. http://www.aroma-project.org/addons. BioMed Central 2012-08-16 /pmc/articles/PMC3512512/ /pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19 Text en Copyright ©2012 Ortiz-Estevez et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Article Ortiz-Estevez, Maria Aramburu, Ander Rubio, Angel Getting DNA copy numbers without control samples
title	Getting DNA copy numbers without control samples
title_full	Getting DNA copy numbers without control samples
title_fullStr	Getting DNA copy numbers without control samples
title_full_unstemmed	Getting DNA copy numbers without control samples
title_short	Getting DNA copy numbers without control samples
title_sort	getting dna copy numbers without control samples
topic	Software Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3512512/ https://www.ncbi.nlm.nih.gov/pubmed/22898240 http://dx.doi.org/10.1186/1748-7188-7-19
work_keys_str_mv	AT ortizestevezmaria gettingdnacopynumberswithoutcontrolsamples AT aramburuander gettingdnacopynumberswithoutcontrolsamples AT rubioangel gettingdnacopynumberswithoutcontrolsamples

Getting DNA copy numbers without control samples

Ejemplares similares