Cargando…
Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics
MOTIVATION: Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for non-experts, remain. Automated machin...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634896/ https://www.ncbi.nlm.nih.gov/pubmed/37961534 http://dx.doi.org/10.1101/2023.10.26.564244 |
_version_ | 1785146257389387776 |
---|---|
author | Bifarin, Olatomiwa O. Fernández, Facundo M. |
author_facet | Bifarin, Olatomiwa O. Fernández, Facundo M. |
author_sort | Bifarin, Olatomiwa O. |
collection | PubMed |
description | MOTIVATION: Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for non-experts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. RESULTS: We tested our approach on two datasets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using auto-sklearn, surpassed standalone ML algorithms such as SVM and random forest in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers (Non-OC). Auto-sklearn employed a mix of algorithms and ensemble techniques, yielding a superior performance (AUC of 0.97 for RCC and 0.85 for OC). Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science. AVAILABILITY: https://github.com/obifarin/automl-xai-metabolomics |
format | Online Article Text |
id | pubmed-10634896 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106348962023-11-13 Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics Bifarin, Olatomiwa O. Fernández, Facundo M. bioRxiv Article MOTIVATION: Metabolomics generates complex data necessitating advanced computational methods for generating biological insight. While machine learning (ML) is promising, the challenges of selecting the best algorithms and tuning hyperparameters, particularly for non-experts, remain. Automated machine learning (AutoML) can streamline this process; however, the issue of interpretability could persist. This research introduces a unified pipeline that combines AutoML with explainable AI (XAI) techniques to optimize metabolomics analysis. RESULTS: We tested our approach on two datasets: renal cell carcinoma (RCC) urine metabolomics and ovarian cancer (OC) serum metabolomics. AutoML, using auto-sklearn, surpassed standalone ML algorithms such as SVM and random forest in differentiating between RCC and healthy controls, as well as OC patients and those with other gynecological cancers (Non-OC). Auto-sklearn employed a mix of algorithms and ensemble techniques, yielding a superior performance (AUC of 0.97 for RCC and 0.85 for OC). Shapley Additive Explanations (SHAP) provided a global ranking of feature importance, identifying dibutylamine and ganglioside GM(d34:1) as the top discriminative metabolites for RCC and OC, respectively. Waterfall plots offered local explanations by illustrating the influence of each metabolite on individual predictions. Dependence plots spotlighted metabolite interactions, such as the connection between hippuric acid and one of its derivatives in RCC, and between GM3(d34:1) and GM3(18:1_16:0) in OC, hinting at potential mechanistic relationships. Through decision plots, a detailed error analysis was conducted, contrasting feature importance for correctly versus incorrectly classified samples. In essence, our pipeline emphasizes the importance of harmonizing AutoML and XAI, facilitating both simplified ML application and improved interpretability in metabolomics data science. AVAILABILITY: https://github.com/obifarin/automl-xai-metabolomics Cold Spring Harbor Laboratory 2023-10-31 /pmc/articles/PMC10634896/ /pubmed/37961534 http://dx.doi.org/10.1101/2023.10.26.564244 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Bifarin, Olatomiwa O. Fernández, Facundo M. Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title | Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title_full | Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title_fullStr | Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title_full_unstemmed | Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title_short | Automated machine learning and explainable AI (AutoML-XAI) for metabolomics: improving cancer diagnostics |
title_sort | automated machine learning and explainable ai (automl-xai) for metabolomics: improving cancer diagnostics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634896/ https://www.ncbi.nlm.nih.gov/pubmed/37961534 http://dx.doi.org/10.1101/2023.10.26.564244 |
work_keys_str_mv | AT bifarinolatomiwao automatedmachinelearningandexplainableaiautomlxaiformetabolomicsimprovingcancerdiagnostics AT fernandezfacundom automatedmachinelearningandexplainableaiautomlxaiformetabolomicsimprovingcancerdiagnostics |