Cargando…
Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection
In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accurac...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827794/ https://www.ncbi.nlm.nih.gov/pubmed/27081305 http://dx.doi.org/10.4137/CIN.S30795 |
_version_ | 1782426510204862464 |
---|---|
author | Hassan, S. Sakira Ruusuvuori, Pekka Latonen, Leena Huttunen, Heikki |
author_facet | Hassan, S. Sakira Ruusuvuori, Pekka Latonen, Leena Huttunen, Heikki |
author_sort | Hassan, S. Sakira |
collection | PubMed |
description | In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ(1) regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator. |
format | Online Article Text |
id | pubmed-4827794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-48277942016-04-14 Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection Hassan, S. Sakira Ruusuvuori, Pekka Latonen, Leena Huttunen, Heikki Cancer Inform Original Research In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ(1) regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator. Libertas Academica 2016-04-10 /pmc/articles/PMC4827794/ /pubmed/27081305 http://dx.doi.org/10.4137/CIN.S30795 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Original Research Hassan, S. Sakira Ruusuvuori, Pekka Latonen, Leena Huttunen, Heikki Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title | Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title_full | Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title_fullStr | Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title_full_unstemmed | Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title_short | Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection |
title_sort | flow cytometry-based classification in cancer research: a view on feature selection |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4827794/ https://www.ncbi.nlm.nih.gov/pubmed/27081305 http://dx.doi.org/10.4137/CIN.S30795 |
work_keys_str_mv | AT hassanssakira flowcytometrybasedclassificationincancerresearchaviewonfeatureselection AT ruusuvuoripekka flowcytometrybasedclassificationincancerresearchaviewonfeatureselection AT latonenleena flowcytometrybasedclassificationincancerresearchaviewonfeatureselection AT huttunenheikki flowcytometrybasedclassificationincancerresearchaviewonfeatureselection |