Cargando…

Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables

The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Wenck, Soeren, Mix, Thorsten, Fischer, Markus, Hackl, Thomas, Seifert, Stephan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10608983/
https://www.ncbi.nlm.nih.gov/pubmed/37887402
http://dx.doi.org/10.3390/metabo13101075
_version_ 1785127906619424768
author Wenck, Soeren
Mix, Thorsten
Fischer, Markus
Hackl, Thomas
Seifert, Stephan
author_facet Wenck, Soeren
Mix, Thorsten
Fischer, Markus
Hackl, Thomas
Seifert, Stephan
author_sort Wenck, Soeren
collection PubMed
description The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. These methods are usually applied as a black box, which means that no information about the complex relationships between the variables and the outcome is obtained. In this study, we show that the random forest-based approach surrogate minimal depth (SMD) can be applied for a comprehensive analysis of class-specific differences by selecting relevant variables and analyzing their mutual impact on the classification model of different truffle species. SMD allows the assignment of variables from the same metabolites as well as the detection of interactions between different metabolites that can be attributed to known biological relationships.
format Online
Article
Text
id pubmed-10608983
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106089832023-10-28 Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables Wenck, Soeren Mix, Thorsten Fischer, Markus Hackl, Thomas Seifert, Stephan Metabolites Article The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. These methods are usually applied as a black box, which means that no information about the complex relationships between the variables and the outcome is obtained. In this study, we show that the random forest-based approach surrogate minimal depth (SMD) can be applied for a comprehensive analysis of class-specific differences by selecting relevant variables and analyzing their mutual impact on the classification model of different truffle species. SMD allows the assignment of variables from the same metabolites as well as the detection of interactions between different metabolites that can be attributed to known biological relationships. MDPI 2023-10-13 /pmc/articles/PMC10608983/ /pubmed/37887402 http://dx.doi.org/10.3390/metabo13101075 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wenck, Soeren
Mix, Thorsten
Fischer, Markus
Hackl, Thomas
Seifert, Stephan
Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title_full Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title_fullStr Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title_full_unstemmed Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title_short Opening the Random Forest Black Box of (1)H NMR Metabolomics Data by the Exploitation of Surrogate Variables
title_sort opening the random forest black box of (1)h nmr metabolomics data by the exploitation of surrogate variables
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10608983/
https://www.ncbi.nlm.nih.gov/pubmed/37887402
http://dx.doi.org/10.3390/metabo13101075
work_keys_str_mv AT wencksoeren openingtherandomforestblackboxof1hnmrmetabolomicsdatabytheexploitationofsurrogatevariables
AT mixthorsten openingtherandomforestblackboxof1hnmrmetabolomicsdatabytheexploitationofsurrogatevariables
AT fischermarkus openingtherandomforestblackboxof1hnmrmetabolomicsdatabytheexploitationofsurrogatevariables
AT hacklthomas openingtherandomforestblackboxof1hnmrmetabolomicsdatabytheexploitationofsurrogatevariables
AT seifertstephan openingtherandomforestblackboxof1hnmrmetabolomicsdatabytheexploitationofsurrogatevariables