Cargando…

A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets

We propose a novel method called Partitioning based Adaptive Irrelevant Feature Eliminator (PAIFE) for dimensionality reduction in high-dimensional biomedical datasets. PAIFE evaluates feature-target relationships over not only a whole dataset, but also the partitioned subsets and is extremely effec...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu1, Guodong, Kong, Lan, Gopalakrishnan, Vanathi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392052/
https://www.ncbi.nlm.nih.gov/pubmed/22779051
_version_ 1782237586856607744
author Liu1, Guodong
Kong, Lan
Gopalakrishnan, Vanathi
author_facet Liu1, Guodong
Kong, Lan
Gopalakrishnan, Vanathi
author_sort Liu1, Guodong
collection PubMed
description We propose a novel method called Partitioning based Adaptive Irrelevant Feature Eliminator (PAIFE) for dimensionality reduction in high-dimensional biomedical datasets. PAIFE evaluates feature-target relationships over not only a whole dataset, but also the partitioned subsets and is extremely effective in identifying features whose relevancies to the target are conditional on certain other features. PAIFE adaptively employs the most appropriate feature evaluation strategy, statistical test and parameter instantiation. We envision PAIFE to be used as a third-party data pre-processing tool for dimensionality reduction of high-dimensional clinical datasets. Experiments on synthetic datasets showed that PAIFE consistently outperformed state-of-the-art feature selection methods in removing irrelevant features while retaining relevant features. Experiments on genomic and proteomic datasets demonstrated that PAIFE was able to remove significant numbers of irrelevant features in real-world biomedical datasets. Classification models constructed from the retained features either matched or improved the classification performances of the models constructed using all features.
format Online
Article
Text
id pubmed-3392052
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-33920522012-07-09 A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets Liu1, Guodong Kong, Lan Gopalakrishnan, Vanathi AMIA Jt Summits Transl Sci Proc Articles We propose a novel method called Partitioning based Adaptive Irrelevant Feature Eliminator (PAIFE) for dimensionality reduction in high-dimensional biomedical datasets. PAIFE evaluates feature-target relationships over not only a whole dataset, but also the partitioned subsets and is extremely effective in identifying features whose relevancies to the target are conditional on certain other features. PAIFE adaptively employs the most appropriate feature evaluation strategy, statistical test and parameter instantiation. We envision PAIFE to be used as a third-party data pre-processing tool for dimensionality reduction of high-dimensional clinical datasets. Experiments on synthetic datasets showed that PAIFE consistently outperformed state-of-the-art feature selection methods in removing irrelevant features while retaining relevant features. Experiments on genomic and proteomic datasets demonstrated that PAIFE was able to remove significant numbers of irrelevant features in real-world biomedical datasets. Classification models constructed from the retained features either matched or improved the classification performances of the models constructed using all features. American Medical Informatics Association 2012-03-19 /pmc/articles/PMC3392052/ /pubmed/22779051 Text en ©2012 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Liu1, Guodong
Kong, Lan
Gopalakrishnan, Vanathi
A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title_full A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title_fullStr A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title_full_unstemmed A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title_short A Partitioning Based Adaptive Method for Robust Removal of Irrelevant Features from High-dimensional Biomedical Datasets
title_sort partitioning based adaptive method for robust removal of irrelevant features from high-dimensional biomedical datasets
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392052/
https://www.ncbi.nlm.nih.gov/pubmed/22779051
work_keys_str_mv AT liu1guodong apartitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets
AT konglan apartitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets
AT gopalakrishnanvanathi apartitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets
AT liu1guodong partitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets
AT konglan partitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets
AT gopalakrishnanvanathi partitioningbasedadaptivemethodforrobustremovalofirrelevantfeaturesfromhighdimensionalbiomedicaldatasets