Cargando…

Classification and feature selection algorithms for multi-class CGH data

Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimens...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Jun, Ranka, Sanjay, Kahveci, Tamer
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718623/
https://www.ncbi.nlm.nih.gov/pubmed/18586749
http://dx.doi.org/10.1093/bioinformatics/btn145
_version_ 1782170001688494080
author Liu, Jun
Ranka, Sanjay
Kahveci, Tamer
author_facet Liu, Jun
Ranka, Sanjay
Kahveci, Tamer
author_sort Liu, Jun
collection PubMed
description Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. Availability: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz Contact: juliu@cise.ufl.edu
format Text
id pubmed-2718623
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27186232009-07-31 Classification and feature selection algorithms for multi-class CGH data Liu, Jun Ranka, Sanjay Kahveci, Tamer Bioinformatics Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. Availability: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz Contact: juliu@cise.ufl.edu Oxford University Press 2008-07-01 /pmc/articles/PMC2718623/ /pubmed/18586749 http://dx.doi.org/10.1093/bioinformatics/btn145 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
Liu, Jun
Ranka, Sanjay
Kahveci, Tamer
Classification and feature selection algorithms for multi-class CGH data
title Classification and feature selection algorithms for multi-class CGH data
title_full Classification and feature selection algorithms for multi-class CGH data
title_fullStr Classification and feature selection algorithms for multi-class CGH data
title_full_unstemmed Classification and feature selection algorithms for multi-class CGH data
title_short Classification and feature selection algorithms for multi-class CGH data
title_sort classification and feature selection algorithms for multi-class cgh data
topic Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718623/
https://www.ncbi.nlm.nih.gov/pubmed/18586749
http://dx.doi.org/10.1093/bioinformatics/btn145
work_keys_str_mv AT liujun classificationandfeatureselectionalgorithmsformulticlasscghdata
AT rankasanjay classificationandfeatureselectionalgorithmsformulticlasscghdata
AT kahvecitamer classificationandfeatureselectionalgorithmsformulticlasscghdata