Cargando…

Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning

Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally exten...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhenqiu, Bensmail, Halima, Tan, Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3347893/
https://www.ncbi.nlm.nih.gov/pubmed/22577297
http://dx.doi.org/10.4137/EBO.S9407
_version_ 1782232341758869504
author Liu, Zhenqiu
Bensmail, Halima
Tan, Ming
author_facet Liu, Zhenqiu
Bensmail, Halima
Tan, Ming
author_sort Liu, Zhenqiu
collection PubMed
description Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L(1) or L(p) penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced. By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multiclass metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.
format Online
Article
Text
id pubmed-3347893
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-33478932012-05-10 Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning Liu, Zhenqiu Bensmail, Halima Tan, Ming Evol Bioinform Online Methodology Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L(1) or L(p) penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced. By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multiclass metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments. Libertas Academica 2012-04-30 /pmc/articles/PMC3347893/ /pubmed/22577297 http://dx.doi.org/10.4137/EBO.S9407 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Methodology
Liu, Zhenqiu
Bensmail, Halima
Tan, Ming
Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title_full Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title_fullStr Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title_full_unstemmed Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title_short Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning
title_sort efficient feature selection and multiclass classification with integrated instance and model based learning
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3347893/
https://www.ncbi.nlm.nih.gov/pubmed/22577297
http://dx.doi.org/10.4137/EBO.S9407
work_keys_str_mv AT liuzhenqiu efficientfeatureselectionandmulticlassclassificationwithintegratedinstanceandmodelbasedlearning
AT bensmailhalima efficientfeatureselectionandmulticlassclassificationwithintegratedinstanceandmodelbasedlearning
AT tanming efficientfeatureselectionandmulticlassclassificationwithintegratedinstanceandmodelbasedlearning