Cargando…

A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pudjihartono, Nicholas, Fadason, Tayaza, Kempa-Liehr, Andreas W., O'Sullivan, Justin M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580915/ https://www.ncbi.nlm.nih.gov/pubmed/36304293 http://dx.doi.org/10.3389/fbinf.2022.927312

_version_	1784812499518881792
author	Pudjihartono, Nicholas Fadason, Tayaza Kempa-Liehr, Andreas W. O'Sullivan, Justin M.
author_facet	Pudjihartono, Nicholas Fadason, Tayaza Kempa-Liehr, Andreas W. O'Sullivan, Justin M.
author_sort	Pudjihartono, Nicholas
collection	PubMed
description	Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called “curse of dimensionality” (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most “informative” features and remove noisy “non-informative,” irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
format	Online Article Text
id	pubmed-9580915
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-95809152022-10-26 A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction Pudjihartono, Nicholas Fadason, Tayaza Kempa-Liehr, Andreas W. O'Sullivan, Justin M. Front Bioinform Bioinformatics Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called “curse of dimensionality” (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most “informative” features and remove noisy “non-informative,” irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction. Frontiers Media S.A. 2022-06-27 /pmc/articles/PMC9580915/ /pubmed/36304293 http://dx.doi.org/10.3389/fbinf.2022.927312 Text en Copyright © 2022 Pudjihartono, Fadason, Kempa-Liehr and O'Sullivan. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Bioinformatics Pudjihartono, Nicholas Fadason, Tayaza Kempa-Liehr, Andreas W. O'Sullivan, Justin M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title	A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title_full	A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title_fullStr	A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title_full_unstemmed	A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title_short	A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
title_sort	review of feature selection methods for machine learning-based disease risk prediction
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580915/ https://www.ncbi.nlm.nih.gov/pubmed/36304293 http://dx.doi.org/10.3389/fbinf.2022.927312
work_keys_str_mv	AT pudjihartononicholas areviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT fadasontayaza areviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT kempaliehrandreasw areviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT osullivanjustinm areviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT pudjihartononicholas reviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT fadasontayaza reviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT kempaliehrandreasw reviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction AT osullivanjustinm reviewoffeatureselectionmethodsformachinelearningbaseddiseaseriskprediction

A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction

Ejemplares similares