Cargando…

Probabilistic metabolite annotation using retention time prediction and meta-learned projections

Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction...

Descripción completa

Detalles Bibliográficos
Autores principales: García, Constantino A., Gil-de-la-Fuente, Alberto, Barbas, Coral, Otero, Abraham
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/
https://www.ncbi.nlm.nih.gov/pubmed/35672784
http://dx.doi.org/10.1186/s13321-022-00613-8
_version_ 1784721825991753728
author García, Constantino A.
Gil-de-la-Fuente, Alberto
Barbas, Coral
Otero, Abraham
author_facet García, Constantino A.
Gil-de-la-Fuente, Alberto
Barbas, Coral
Otero, Abraham
author_sort García, Constantino A.
collection PubMed
description Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8.
format Online
Article
Text
id pubmed-9172150
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-91721502022-06-08 Probabilistic metabolite annotation using retention time prediction and meta-learned projections García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham J Cheminform Research Article Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8. Springer International Publishing 2022-06-07 /pmc/articles/PMC9172150/ /pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
García, Constantino A.
Gil-de-la-Fuente, Alberto
Barbas, Coral
Otero, Abraham
Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_full Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_fullStr Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_full_unstemmed Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_short Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_sort probabilistic metabolite annotation using retention time prediction and meta-learned projections
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/
https://www.ncbi.nlm.nih.gov/pubmed/35672784
http://dx.doi.org/10.1186/s13321-022-00613-8
work_keys_str_mv AT garciaconstantinoa probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections
AT gildelafuentealberto probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections
AT barbascoral probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections
AT oteroabraham probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections