Cargando…

Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accurac...

Descripción completa

Detalles Bibliográficos
Autores principales: Hassan, S. Sakira, Ruusuvuori, Pekka, Latonen, Leena, Huttunen, Heikki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827794/
https://www.ncbi.nlm.nih.gov/pubmed/27081305
http://dx.doi.org/10.4137/CIN.S30795
_version_ 1782426510204862464
author Hassan, S. Sakira
Ruusuvuori, Pekka
Latonen, Leena
Huttunen, Heikki
author_facet Hassan, S. Sakira
Ruusuvuori, Pekka
Latonen, Leena
Huttunen, Heikki
author_sort Hassan, S. Sakira
collection PubMed
description In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ(1) regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.
format Online
Article
Text
id pubmed-4827794
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-48277942016-04-14 Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection Hassan, S. Sakira Ruusuvuori, Pekka Latonen, Leena Huttunen, Heikki Cancer Inform Original Research In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ(1) regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator. Libertas Academica 2016-04-10 /pmc/articles/PMC4827794/ /pubmed/27081305 http://dx.doi.org/10.4137/CIN.S30795 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Original Research
Hassan, S. Sakira
Ruusuvuori, Pekka
Latonen, Leena
Huttunen, Heikki
Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title_full Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title_fullStr Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title_full_unstemmed Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title_short Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
title_sort flow cytometry-based classification in cancer research: a view on feature selection
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827794/
https://www.ncbi.nlm.nih.gov/pubmed/27081305
http://dx.doi.org/10.4137/CIN.S30795
work_keys_str_mv AT hassanssakira flowcytometrybasedclassificationincancerresearchaviewonfeatureselection
AT ruusuvuoripekka flowcytometrybasedclassificationincancerresearchaviewonfeatureselection
AT latonenleena flowcytometrybasedclassificationincancerresearchaviewonfeatureselection
AT huttunenheikki flowcytometrybasedclassificationincancerresearchaviewonfeatureselection