Cargando…

TIGER: technical variation elimination for metabolomics data using ensemble learning architecture

Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Siyu, Huang, Jialing, Foppiano, Francesco, Prehn, Cornelia, Adamski, Jerzy, Suhre, Karsten, Li, Ying, Matullo, Giuseppe, Schliess, Freimut, Gieger, Christian, Peters, Annette, Wang-Sattler, Rui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Problem Solving Protocol
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921617/ https://www.ncbi.nlm.nih.gov/pubmed/34981111 http://dx.doi.org/10.1093/bib/bbab535

_version_	1784669359578284032
author	Han, Siyu Huang, Jialing Foppiano, Francesco Prehn, Cornelia Adamski, Jerzy Suhre, Karsten Li, Ying Matullo, Giuseppe Schliess, Freimut Gieger, Christian Peters, Annette Wang-Sattler, Rui
author_facet	Han, Siyu Huang, Jialing Foppiano, Francesco Prehn, Cornelia Adamski, Jerzy Suhre, Karsten Li, Ying Matullo, Giuseppe Schliess, Freimut Gieger, Christian Peters, Annette Wang-Sattler, Rui
author_sort	Han, Siyu
collection	PubMed
description	Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis.
format	Online Article Text
id	pubmed-8921617
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-89216172022-03-15 TIGER: technical variation elimination for metabolomics data using ensemble learning architecture Han, Siyu Huang, Jialing Foppiano, Francesco Prehn, Cornelia Adamski, Jerzy Suhre, Karsten Li, Ying Matullo, Giuseppe Schliess, Freimut Gieger, Christian Peters, Annette Wang-Sattler, Rui Brief Bioinform Problem Solving Protocol Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis. Oxford University Press 2022-01-03 /pmc/articles/PMC8921617/ /pubmed/34981111 http://dx.doi.org/10.1093/bib/bbab535 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Problem Solving Protocol Han, Siyu Huang, Jialing Foppiano, Francesco Prehn, Cornelia Adamski, Jerzy Suhre, Karsten Li, Ying Matullo, Giuseppe Schliess, Freimut Gieger, Christian Peters, Annette Wang-Sattler, Rui TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title	TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title_full	TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title_fullStr	TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title_full_unstemmed	TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title_short	TIGER: technical variation elimination for metabolomics data using ensemble learning architecture
title_sort	tiger: technical variation elimination for metabolomics data using ensemble learning architecture
topic	Problem Solving Protocol
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921617/ https://www.ncbi.nlm.nih.gov/pubmed/34981111 http://dx.doi.org/10.1093/bib/bbab535
work_keys_str_mv	AT hansiyu tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT huangjialing tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT foppianofrancesco tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT prehncornelia tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT adamskijerzy tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT suhrekarsten tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT liying tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT matullogiuseppe tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT schliessfreimut tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT giegerchristian tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT petersannette tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture AT wangsattlerrui tigertechnicalvariationeliminationformetabolomicsdatausingensemblelearningarchitecture

TIGER: technical variation elimination for metabolomics data using ensemble learning architecture

Ejemplares similares