Cargando…

Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification

With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification...

Descripción completa

Detalles Bibliográficos
Autores principales: Mukhopadhyay, Anirban, Bandyopadhyay, Sanghamitra, Maulik, Ujjwal
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2980474/
https://www.ncbi.nlm.nih.gov/pubmed/21103052
http://dx.doi.org/10.1371/journal.pone.0013803
_version_ 1782191624412987392
author Mukhopadhyay, Anirban
Bandyopadhyay, Sanghamitra
Maulik, Ujjwal
author_facet Mukhopadhyay, Anirban
Bandyopadhyay, Sanghamitra
Maulik, Ujjwal
author_sort Mukhopadhyay, Anirban
collection PubMed
description With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes.
format Text
id pubmed-2980474
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29804742010-11-22 Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification Mukhopadhyay, Anirban Bandyopadhyay, Sanghamitra Maulik, Ujjwal PLoS One Research Article With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes. Public Library of Science 2010-11-12 /pmc/articles/PMC2980474/ /pubmed/21103052 http://dx.doi.org/10.1371/journal.pone.0013803 Text en Mukhopadhyay et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mukhopadhyay, Anirban
Bandyopadhyay, Sanghamitra
Maulik, Ujjwal
Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title_full Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title_fullStr Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title_full_unstemmed Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title_short Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification
title_sort multi-class clustering of cancer subtypes through svm based ensemble of pareto-optimal solutions for gene marker identification
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2980474/
https://www.ncbi.nlm.nih.gov/pubmed/21103052
http://dx.doi.org/10.1371/journal.pone.0013803
work_keys_str_mv AT mukhopadhyayanirban multiclassclusteringofcancersubtypesthroughsvmbasedensembleofparetooptimalsolutionsforgenemarkeridentification
AT bandyopadhyaysanghamitra multiclassclusteringofcancersubtypesthroughsvmbasedensembleofparetooptimalsolutionsforgenemarkeridentification
AT maulikujjwal multiclassclusteringofcancersubtypesthroughsvmbasedensembleofparetooptimalsolutionsforgenemarkeridentification