Cargando…

UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computa...

Descripción completa

Detalles Bibliográficos
Autores principales: Kontou, Eftychia E., Walter, Axel, Alka, Oliver, Pfeuffer, Julianus, Sachsenberg, Timo, Mohite, Omkar S., Nuhamunada, Matin, Kohlbacher, Oliver, Weber, Tilmann
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176759/
https://www.ncbi.nlm.nih.gov/pubmed/37173725
http://dx.doi.org/10.1186/s13321-023-00724-w
_version_ 1785040494276902912
author Kontou, Eftychia E.
Walter, Axel
Alka, Oliver
Pfeuffer, Julianus
Sachsenberg, Timo
Mohite, Omkar S.
Nuhamunada, Matin
Kohlbacher, Oliver
Weber, Tilmann
author_facet Kontou, Eftychia E.
Walter, Axel
Alka, Oliver
Pfeuffer, Julianus
Sachsenberg, Timo
Mohite, Omkar S.
Nuhamunada, Matin
Kohlbacher, Oliver
Weber, Tilmann
author_sort Kontou, Eftychia E.
collection PubMed
description Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00724-w.
format Online
Article
Text
id pubmed-10176759
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-101767592023-05-13 UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis Kontou, Eftychia E. Walter, Axel Alka, Oliver Pfeuffer, Julianus Sachsenberg, Timo Mohite, Omkar S. Nuhamunada, Matin Kohlbacher, Oliver Weber, Tilmann J Cheminform Methodology Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00724-w. Springer International Publishing 2023-05-12 /pmc/articles/PMC10176759/ /pubmed/37173725 http://dx.doi.org/10.1186/s13321-023-00724-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Kontou, Eftychia E.
Walter, Axel
Alka, Oliver
Pfeuffer, Julianus
Sachsenberg, Timo
Mohite, Omkar S.
Nuhamunada, Matin
Kohlbacher, Oliver
Weber, Tilmann
UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_full UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_fullStr UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_full_unstemmed UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_short UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
title_sort umetaflow: an untargeted metabolomics workflow for high-throughput data processing and analysis
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176759/
https://www.ncbi.nlm.nih.gov/pubmed/37173725
http://dx.doi.org/10.1186/s13321-023-00724-w
work_keys_str_mv AT kontoueftychiae umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT walteraxel umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT alkaoliver umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT pfeufferjulianus umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT sachsenbergtimo umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT mohiteomkars umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT nuhamunadamatin umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT kohlbacheroliver umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis
AT webertilmann umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis