Cargando…

Classification of arrayCGH data using fused SVM

Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated sup...

Descripción completa

Detalles Bibliográficos
Autores principales: Rapaport, Franck, Barillot, Emmanuel, Vert, Jean-Philippe
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718663/
https://www.ncbi.nlm.nih.gov/pubmed/18586737
http://dx.doi.org/10.1093/bioinformatics/btn188
_version_ 1782170011227389952
author Rapaport, Franck
Barillot, Emmanuel
Vert, Jean-Philippe
author_facet Rapaport, Franck
Barillot, Emmanuel
Vert, Jean-Philippe
author_sort Rapaport, Franck
collection PubMed
description Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules. Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome. Availability: All data and algorithms are publicly available. Contact: franck.rapaport@curie.fr
format Text
id pubmed-2718663
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27186632009-07-31 Classification of arrayCGH data using fused SVM Rapaport, Franck Barillot, Emmanuel Vert, Jean-Philippe Bioinformatics Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules. Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome. Availability: All data and algorithms are publicly available. Contact: franck.rapaport@curie.fr Oxford University Press 2008-07-01 /pmc/articles/PMC2718663/ /pubmed/18586737 http://dx.doi.org/10.1093/bioinformatics/btn188 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
Rapaport, Franck
Barillot, Emmanuel
Vert, Jean-Philippe
Classification of arrayCGH data using fused SVM
title Classification of arrayCGH data using fused SVM
title_full Classification of arrayCGH data using fused SVM
title_fullStr Classification of arrayCGH data using fused SVM
title_full_unstemmed Classification of arrayCGH data using fused SVM
title_short Classification of arrayCGH data using fused SVM
title_sort classification of arraycgh data using fused svm
topic Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718663/
https://www.ncbi.nlm.nih.gov/pubmed/18586737
http://dx.doi.org/10.1093/bioinformatics/btn188
work_keys_str_mv AT rapaportfranck classificationofarraycghdatausingfusedsvm
AT barillotemmanuel classificationofarraycghdatausingfusedsvm
AT vertjeanphilippe classificationofarraycghdatausingfusedsvm