Cargando…

Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data

BACKGROUND: The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet...

Descripción completa

Detalles Bibliográficos
Autores principales: Mosén-Ansorena, David, Aransay, Ana María, Rodríguez-Ezpeleta, Naiara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472297/
https://www.ncbi.nlm.nih.gov/pubmed/22870940
http://dx.doi.org/10.1186/1471-2105-13-192
_version_ 1782246575656927232
author Mosén-Ansorena, David
Aransay, Ana María
Rodríguez-Ezpeleta, Naiara
author_facet Mosén-Ansorena, David
Aransay, Ana María
Rodríguez-Ezpeleta, Naiara
author_sort Mosén-Ansorena, David
collection PubMed
description BACKGROUND: The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable. RESULTS: We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels. Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based. CONCLUSIONS: We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis.
format Online
Article
Text
id pubmed-3472297
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34722972012-10-23 Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data Mosén-Ansorena, David Aransay, Ana María Rodríguez-Ezpeleta, Naiara BMC Bioinformatics Research Article BACKGROUND: The detection of genomic copy number alterations (CNA) in cancer based on SNP arrays requires methods that take into account tumour specific factors such as normal cell contamination and tumour heterogeneity. A number of tools have been recently developed but their performance needs yet to be thoroughly assessed. To this aim, a comprehensive model that integrates the factors of normal cell contamination and intra-tumour heterogeneity and that can be translated to synthetic data on which to perform benchmarks is indispensable. RESULTS: We propose such model and implement it in an R package called CnaGen to synthetically generate a wide range of alterations under different normal cell contamination levels. Six recently published methods for CNA and loss of heterozygosity (LOH) detection on tumour samples were assessed on this synthetic data and on a dilution series of a breast cancer cell-line: ASCAT, GAP, GenoCNA, GPHMM, MixHMM and OncoSNP. We report the recall rates in terms of normal cell contamination levels and alteration characteristics: length, copy number and LOH state, as well as the false discovery rate distribution for each copy number under different normal cell contamination levels. Assessed methods are in general better at detecting alterations with low copy number and under a little normal cell contamination levels. All methods except GPHMM, which failed to recognize the alteration pattern in the cell-line samples, provided similar results for the synthetic and cell-line sample sets. MixHMM and GenoCNA are the poorliest performing methods, while GAP generally performed better. This supports the viability of approaches other than the common hidden Markov model (HMM)-based. CONCLUSIONS: We devised and implemented a comprehensive model to generate data that simulate tumoural samples genotyped using SNP arrays. The validity of the model is supported by the similarity of the results obtained with synthetic and real data. Based on these results and on the software implementation of the methods, we recommend GAP for advanced users and GPHMM for a fully driven analysis. BioMed Central 2012-08-07 /pmc/articles/PMC3472297/ /pubmed/22870940 http://dx.doi.org/10.1186/1471-2105-13-192 Text en Copyright ©2012 Mosén-Ansorena et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mosén-Ansorena, David
Aransay, Ana María
Rodríguez-Ezpeleta, Naiara
Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title_full Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title_fullStr Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title_full_unstemmed Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title_short Comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
title_sort comparison of methods to detect copy number alterations in cancer using simulated and real genotyping data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472297/
https://www.ncbi.nlm.nih.gov/pubmed/22870940
http://dx.doi.org/10.1186/1471-2105-13-192
work_keys_str_mv AT mosenansorenadavid comparisonofmethodstodetectcopynumberalterationsincancerusingsimulatedandrealgenotypingdata
AT aransayanamaria comparisonofmethodstodetectcopynumberalterationsincancerusingsimulatedandrealgenotypingdata
AT rodriguezezpeletanaiara comparisonofmethodstodetectcopynumberalterationsincancerusingsimulatedandrealgenotypingdata