Cargando…

Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity

The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools...

Descripción completa

Detalles Bibliográficos
Autores principales:	Torres-Martos, Álvaro, Bustos-Aibar, Mireia, Ramírez-Mena, Alberto, Cámara-Sánchez, Sofía, Anguita-Ruiz, Augusto, Alcalá, Rafael, Aguilera, Concepción M., Alcalá-Fdez, Jesús
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956296/ https://www.ncbi.nlm.nih.gov/pubmed/36833178 http://dx.doi.org/10.3390/genes14020248

_version_	1784894558261215232
author	Torres-Martos, Álvaro Bustos-Aibar, Mireia Ramírez-Mena, Alberto Cámara-Sánchez, Sofía Anguita-Ruiz, Augusto Alcalá, Rafael Aguilera, Concepción M. Alcalá-Fdez, Jesús
author_facet	Torres-Martos, Álvaro Bustos-Aibar, Mireia Ramírez-Mena, Alberto Cámara-Sánchez, Sofía Anguita-Ruiz, Augusto Alcalá, Rafael Aguilera, Concepción M. Alcalá-Fdez, Jesús
author_sort	Torres-Martos, Álvaro
collection	PubMed
description	The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools are subject to the proper application of algorithms as well as the appropriate pre-processing and management of input omics and molecular data. Currently, many of the available approaches that use machine learning on omics data for predictive purposes make mistakes in several of the following key steps: experimental design, feature selection, data pre-processing, and algorithm selection. For this reason, we propose the current work as a guideline on how to confront the main challenges inherent to multi-omics human data. As such, a series of best practices and recommendations are also presented for each of the steps defined. In particular, the main particularities of each omics data layer, the most suitable preprocessing approaches for each source, and a compilation of best practices and tips for the study of disease development prediction using machine learning are described. Using examples of real data, we show how to address the key problems mentioned in multi-omics research (e.g., biological heterogeneity, technical noise, high dimensionality, presence of missing values, and class imbalance). Finally, we define the proposals for model improvement based on the results found, which serve as the bases for future work.
format	Online Article Text
id	pubmed-9956296
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99562962023-02-25 Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity Torres-Martos, Álvaro Bustos-Aibar, Mireia Ramírez-Mena, Alberto Cámara-Sánchez, Sofía Anguita-Ruiz, Augusto Alcalá, Rafael Aguilera, Concepción M. Alcalá-Fdez, Jesús Genes (Basel) Article The use of machine learning techniques for the construction of predictive models of disease outcomes (based on omics and other types of molecular data) has gained enormous relevance in the last few years in the biomedical field. Nonetheless, the virtuosity of omics studies and machine learning tools are subject to the proper application of algorithms as well as the appropriate pre-processing and management of input omics and molecular data. Currently, many of the available approaches that use machine learning on omics data for predictive purposes make mistakes in several of the following key steps: experimental design, feature selection, data pre-processing, and algorithm selection. For this reason, we propose the current work as a guideline on how to confront the main challenges inherent to multi-omics human data. As such, a series of best practices and recommendations are also presented for each of the steps defined. In particular, the main particularities of each omics data layer, the most suitable preprocessing approaches for each source, and a compilation of best practices and tips for the study of disease development prediction using machine learning are described. Using examples of real data, we show how to address the key problems mentioned in multi-omics research (e.g., biological heterogeneity, technical noise, high dimensionality, presence of missing values, and class imbalance). Finally, we define the proposals for model improvement based on the results found, which serve as the bases for future work. MDPI 2023-01-18 /pmc/articles/PMC9956296/ /pubmed/36833178 http://dx.doi.org/10.3390/genes14020248 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Torres-Martos, Álvaro Bustos-Aibar, Mireia Ramírez-Mena, Alberto Cámara-Sánchez, Sofía Anguita-Ruiz, Augusto Alcalá, Rafael Aguilera, Concepción M. Alcalá-Fdez, Jesús Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title	Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title_full	Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title_fullStr	Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title_full_unstemmed	Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title_short	Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity
title_sort	omics data preprocessing for machine learning: a case study in childhood obesity
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956296/ https://www.ncbi.nlm.nih.gov/pubmed/36833178 http://dx.doi.org/10.3390/genes14020248
work_keys_str_mv	AT torresmartosalvaro omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT bustosaibarmireia omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT ramirezmenaalberto omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT camarasanchezsofia omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT anguitaruizaugusto omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT alcalarafael omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT aguileraconcepcionm omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity AT alcalafdezjesus omicsdatapreprocessingformachinelearningacasestudyinchildhoodobesity

Omics Data Preprocessing for Machine Learning: A Case Study in Childhood Obesity

Ejemplares similares