Cargando…

Classification of high dimensional biomedical data based on feature selection using redundant removal

High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seri...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Bingtao, Cao, Peng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456288/ https://www.ncbi.nlm.nih.gov/pubmed/30964868 http://dx.doi.org/10.1371/journal.pone.0214406

_version_	1783409743857975296
author	Zhang, Bingtao Cao, Peng
author_facet	Zhang, Bingtao Cao, Peng
author_sort	Zhang, Bingtao
collection	PubMed
description	High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seriously affect subsequent classification accuracy and machine learning efficiency. To solve this problem, a novel filter feature selection algorithm based on redundant removal (FSBRR) is proposed to classify high dimensional biomedical data in this paper. First of all, two redundant criteria are determined by vertical relevance (the relationship between feature and class attribute) and horizontal relevance (the relationship between feature and feature). Secondly, to quantify redundant criteria, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. To evaluate the effectiveness of our proposed algorithm, controlled trials based on typical feature selection algorithm are conducted using three different classifiers, and the experimental results indicate that the FSBRR algorithm can effectively reduce the feature dimension and improve the classification accuracy. In addition, an experiment of small sample dataset is designed and conducted in the section of discussion and analysis to clarify the specific implementation process of FSBRR algorithm more clearly.
format	Online Article Text
id	pubmed-6456288
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-64562882019-05-03 Classification of high dimensional biomedical data based on feature selection using redundant removal Zhang, Bingtao Cao, Peng PLoS One Research Article High dimensional biomedical data contain tens of thousands of features, accurate and effective identification of the core features in these data can be used to assist diagnose related diseases. However, there are often a large number of irrelevant or redundant features in biomedical data, which seriously affect subsequent classification accuracy and machine learning efficiency. To solve this problem, a novel filter feature selection algorithm based on redundant removal (FSBRR) is proposed to classify high dimensional biomedical data in this paper. First of all, two redundant criteria are determined by vertical relevance (the relationship between feature and class attribute) and horizontal relevance (the relationship between feature and feature). Secondly, to quantify redundant criteria, an approximate redundancy feature framework based on mutual information (MI) is defined to remove redundant and irrelevant features. To evaluate the effectiveness of our proposed algorithm, controlled trials based on typical feature selection algorithm are conducted using three different classifiers, and the experimental results indicate that the FSBRR algorithm can effectively reduce the feature dimension and improve the classification accuracy. In addition, an experiment of small sample dataset is designed and conducted in the section of discussion and analysis to clarify the specific implementation process of FSBRR algorithm more clearly. Public Library of Science 2019-04-09 /pmc/articles/PMC6456288/ /pubmed/30964868 http://dx.doi.org/10.1371/journal.pone.0214406 Text en © 2019 Zhang, Cao http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zhang, Bingtao Cao, Peng Classification of high dimensional biomedical data based on feature selection using redundant removal
title	Classification of high dimensional biomedical data based on feature selection using redundant removal
title_full	Classification of high dimensional biomedical data based on feature selection using redundant removal
title_fullStr	Classification of high dimensional biomedical data based on feature selection using redundant removal
title_full_unstemmed	Classification of high dimensional biomedical data based on feature selection using redundant removal
title_short	Classification of high dimensional biomedical data based on feature selection using redundant removal
title_sort	classification of high dimensional biomedical data based on feature selection using redundant removal
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456288/ https://www.ncbi.nlm.nih.gov/pubmed/30964868 http://dx.doi.org/10.1371/journal.pone.0214406
work_keys_str_mv	AT zhangbingtao classificationofhighdimensionalbiomedicaldatabasedonfeatureselectionusingredundantremoval AT caopeng classificationofhighdimensionalbiomedicaldatabasedonfeatureselectionusingredundantremoval

Classification of high dimensional biomedical data based on feature selection using redundant removal

Ejemplares similares