Cargando…
Classification and feature selection algorithms for multi-class CGH data
Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimens...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718623/ https://www.ncbi.nlm.nih.gov/pubmed/18586749 http://dx.doi.org/10.1093/bioinformatics/btn145 |
_version_ | 1782170001688494080 |
---|---|
author | Liu, Jun Ranka, Sanjay Kahveci, Tamer |
author_facet | Liu, Jun Ranka, Sanjay Kahveci, Tamer |
author_sort | Liu, Jun |
collection | PubMed |
description | Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. Availability: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz Contact: juliu@cise.ufl.edu |
format | Text |
id | pubmed-2718623 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-27186232009-07-31 Classification and feature selection algorithms for multi-class CGH data Liu, Jun Ranka, Sanjay Kahveci, Tamer Bioinformatics Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Recurrent chromosomal alterations provide cytological and molecular positions for the diagnosis and prognosis of cancer. Comparative genomic hybridization (CGH) has been useful in understanding these alterations in cancerous cells. CGH datasets consist of samples that are represented by large dimensional arrays of intervals. Each sample consists of long runs of intervals with losses and gains. In this article, we develop novel SVM-based methods for classification and feature selection of CGH data. For classification, we developed a novel similarity kernel that is shown to be more effective than the standard linear kernel used in SVM. For feature selection, we propose a novel method based on the new kernel that iteratively selects features that provides the maximum benefit for classification. We compared our methods against the best wrapper-based and filter-based approaches that have been used for feature selection of large dimensional biological data. Our results on datasets generated from the Progenetix database, suggests that our methods are considerably superior to existing methods. Availability: All software developed in this article can be downloaded from http://plaza.ufl.edu/junliu/feature.tar.gz Contact: juliu@cise.ufl.edu Oxford University Press 2008-07-01 /pmc/articles/PMC2718623/ /pubmed/18586749 http://dx.doi.org/10.1093/bioinformatics/btn145 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Liu, Jun Ranka, Sanjay Kahveci, Tamer Classification and feature selection algorithms for multi-class CGH data |
title | Classification and feature selection algorithms for multi-class CGH data |
title_full | Classification and feature selection algorithms for multi-class CGH data |
title_fullStr | Classification and feature selection algorithms for multi-class CGH data |
title_full_unstemmed | Classification and feature selection algorithms for multi-class CGH data |
title_short | Classification and feature selection algorithms for multi-class CGH data |
title_sort | classification and feature selection algorithms for multi-class cgh data |
topic | Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718623/ https://www.ncbi.nlm.nih.gov/pubmed/18586749 http://dx.doi.org/10.1093/bioinformatics/btn145 |
work_keys_str_mv | AT liujun classificationandfeatureselectionalgorithmsformulticlasscghdata AT rankasanjay classificationandfeatureselectionalgorithmsformulticlasscghdata AT kahvecitamer classificationandfeatureselectionalgorithmsformulticlasscghdata |