Cargando…
UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computa...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176759/ https://www.ncbi.nlm.nih.gov/pubmed/37173725 http://dx.doi.org/10.1186/s13321-023-00724-w |
_version_ | 1785040494276902912 |
---|---|
author | Kontou, Eftychia E. Walter, Axel Alka, Oliver Pfeuffer, Julianus Sachsenberg, Timo Mohite, Omkar S. Nuhamunada, Matin Kohlbacher, Oliver Weber, Tilmann |
author_facet | Kontou, Eftychia E. Walter, Axel Alka, Oliver Pfeuffer, Julianus Sachsenberg, Timo Mohite, Omkar S. Nuhamunada, Matin Kohlbacher, Oliver Weber, Tilmann |
author_sort | Kontou, Eftychia E. |
collection | PubMed |
description | Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00724-w. |
format | Online Article Text |
id | pubmed-10176759 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-101767592023-05-13 UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis Kontou, Eftychia E. Walter, Axel Alka, Oliver Pfeuffer, Julianus Sachsenberg, Timo Mohite, Omkar S. Nuhamunada, Matin Kohlbacher, Oliver Weber, Tilmann J Cheminform Methodology Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00724-w. Springer International Publishing 2023-05-12 /pmc/articles/PMC10176759/ /pubmed/37173725 http://dx.doi.org/10.1186/s13321-023-00724-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Kontou, Eftychia E. Walter, Axel Alka, Oliver Pfeuffer, Julianus Sachsenberg, Timo Mohite, Omkar S. Nuhamunada, Matin Kohlbacher, Oliver Weber, Tilmann UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_full | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_fullStr | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_full_unstemmed | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_short | UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
title_sort | umetaflow: an untargeted metabolomics workflow for high-throughput data processing and analysis |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10176759/ https://www.ncbi.nlm.nih.gov/pubmed/37173725 http://dx.doi.org/10.1186/s13321-023-00724-w |
work_keys_str_mv | AT kontoueftychiae umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT walteraxel umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT alkaoliver umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT pfeufferjulianus umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT sachsenbergtimo umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT mohiteomkars umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT nuhamunadamatin umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT kohlbacheroliver umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis AT webertilmann umetaflowanuntargetedmetabolomicsworkflowforhighthroughputdataprocessingandanalysis |