Cargando…

Probabilistic metabolite annotation using retention time prediction and meta-learned projections

Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction...

Descripción completa

Detalles Bibliográficos
Autores principales:	García, Constantino A., Gil-de-la-Fuente, Alberto, Barbas, Coral, Otero, Abraham
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/ https://www.ncbi.nlm.nih.gov/pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8

_version_	1784721825991753728
author	García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham
author_facet	García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham
author_sort	García, Constantino A.
collection	PubMed
description	Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8.
format	Online Article Text
id	pubmed-9172150
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-91721502022-06-08 Probabilistic metabolite annotation using retention time prediction and meta-learned projections García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham J Cheminform Research Article Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8. Springer International Publishing 2022-06-07 /pmc/articles/PMC9172150/ /pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title	Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_full	Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_fullStr	Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_full_unstemmed	Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_short	Probabilistic metabolite annotation using retention time prediction and meta-learned projections
title_sort	probabilistic metabolite annotation using retention time prediction and meta-learned projections
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/ https://www.ncbi.nlm.nih.gov/pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8
work_keys_str_mv	AT garciaconstantinoa probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT gildelafuentealberto probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT barbascoral probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT oteroabraham probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections

Probabilistic metabolite annotation using retention time prediction and meta-learned projections

Ejemplares similares