Cargando…

Overview of data preprocessing for machine learning applications in human microbiome research

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variab...

Descripción completa

Detalles Bibliográficos
Autores principales: Ibrahimi, Eliana, Lopes, Marta B., Dhamo, Xhilda, Simeon, Andrea, Shigdel, Rajesh, Hron, Karel, Stres, Blaž, D’Elia, Domenica, Berland, Magali, Marcos-Zambrano, Laura Judith
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588656/
https://www.ncbi.nlm.nih.gov/pubmed/37869650
http://dx.doi.org/10.3389/fmicb.2023.1250909
_version_ 1785123626987552768
author Ibrahimi, Eliana
Lopes, Marta B.
Dhamo, Xhilda
Simeon, Andrea
Shigdel, Rajesh
Hron, Karel
Stres, Blaž
D’Elia, Domenica
Berland, Magali
Marcos-Zambrano, Laura Judith
author_facet Ibrahimi, Eliana
Lopes, Marta B.
Dhamo, Xhilda
Simeon, Andrea
Shigdel, Rajesh
Hron, Karel
Stres, Blaž
D’Elia, Domenica
Berland, Magali
Marcos-Zambrano, Laura Judith
author_sort Ibrahimi, Eliana
collection PubMed
description Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
format Online
Article
Text
id pubmed-10588656
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105886562023-10-21 Overview of data preprocessing for machine learning applications in human microbiome research Ibrahimi, Eliana Lopes, Marta B. Dhamo, Xhilda Simeon, Andrea Shigdel, Rajesh Hron, Karel Stres, Blaž D’Elia, Domenica Berland, Magali Marcos-Zambrano, Laura Judith Front Microbiol Microbiology Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics. Frontiers Media S.A. 2023-10-05 /pmc/articles/PMC10588656/ /pubmed/37869650 http://dx.doi.org/10.3389/fmicb.2023.1250909 Text en Copyright © 2023 Ibrahimi, Lopes, Dhamo, Simeon, Shigdel, Hron, Stres, D’Elia, Berland and Marcos-Zambrano. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Ibrahimi, Eliana
Lopes, Marta B.
Dhamo, Xhilda
Simeon, Andrea
Shigdel, Rajesh
Hron, Karel
Stres, Blaž
D’Elia, Domenica
Berland, Magali
Marcos-Zambrano, Laura Judith
Overview of data preprocessing for machine learning applications in human microbiome research
title Overview of data preprocessing for machine learning applications in human microbiome research
title_full Overview of data preprocessing for machine learning applications in human microbiome research
title_fullStr Overview of data preprocessing for machine learning applications in human microbiome research
title_full_unstemmed Overview of data preprocessing for machine learning applications in human microbiome research
title_short Overview of data preprocessing for machine learning applications in human microbiome research
title_sort overview of data preprocessing for machine learning applications in human microbiome research
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10588656/
https://www.ncbi.nlm.nih.gov/pubmed/37869650
http://dx.doi.org/10.3389/fmicb.2023.1250909
work_keys_str_mv AT ibrahimieliana overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT lopesmartab overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT dhamoxhilda overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT simeonandrea overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT shigdelrajesh overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT hronkarel overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT stresblaz overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT deliadomenica overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT berlandmagali overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch
AT marcoszambranolaurajudith overviewofdatapreprocessingformachinelearningapplicationsinhumanmicrobiomeresearch