Cargando…
Probabilistic metabolite annotation using retention time prediction and meta-learned projections
Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/ https://www.ncbi.nlm.nih.gov/pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8 |
_version_ | 1784721825991753728 |
---|---|
author | García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham |
author_facet | García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham |
author_sort | García, Constantino A. |
collection | PubMed |
description | Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8. |
format | Online Article Text |
id | pubmed-9172150 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-91721502022-06-08 Probabilistic metabolite annotation using retention time prediction and meta-learned projections García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham J Cheminform Research Article Retention time information is used for metabolite annotation in metabolomic experiments. But its usefulness is hindered by the availability of experimental retention time data in metabolomic databases, and by the lack of reproducibility between different chromatographic methods. Accurate prediction of retention time for a given chromatographic method would be a valuable support for metabolite annotation. We have trained state-of-the-art machine learning regressors using the 80, 038 experimental retention times from the METLIN Small Molecule Retention Tim (SMRT) dataset. The models included deep neural networks, deep kernel learning, several gradient boosting models, and a blending approach. 5, 666 molecular descriptors and 2, 214 fingerprints (MACCS166, Extended Connectivity, and Path Fingerprints fingerprints) were generated with the alvaDesc software. The models were trained using only the descriptors, only the fingerprints, and both types of features simultaneously. Bayesian hyperparameter search was used for parameter tuning. To avoid data-leakage when reporting the performance metrics, nested cross-validation was employed. The best results were obtained by a heavily regularized deep neural network trained with cosine annealing warm restarts and stochastic weight averaging, achieving a mean and median absolute errors of [Formula: see text] and [Formula: see text] , respectively. To the best of our knowledge, these are the most accurate predictions published up to date over the SMRT dataset. To project retention times between chromatographic methods, a novel Bayesian meta-learning approach that can learn from just a few molecules is proposed. By applying this projection between the deep neural network retention time predictions and a given chromatographic method, our approach can be integrated into a metabolite annotation workflow to obtain z-scores for the candidate annotations. To this end, it is enough that just as few as 10 molecules of a given experiment have been identified (probably by using pure metabolite standards). The use of z-scores permits considering the uncertainty in the projection when ranking candidates, and not only the accuracy. In this scenario, our results show that in 68% of the cases the correct molecule was among the top three candidates filtered by mass and ranked according to z-scores. This shows the usefulness of this information to support metabolite annotation. Python code is available on GitHub at https://github.com/constantino-garcia/cmmrt. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-022-00613-8. Springer International Publishing 2022-06-07 /pmc/articles/PMC9172150/ /pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article García, Constantino A. Gil-de-la-Fuente, Alberto Barbas, Coral Otero, Abraham Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title | Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title_full | Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title_fullStr | Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title_full_unstemmed | Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title_short | Probabilistic metabolite annotation using retention time prediction and meta-learned projections |
title_sort | probabilistic metabolite annotation using retention time prediction and meta-learned projections |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9172150/ https://www.ncbi.nlm.nih.gov/pubmed/35672784 http://dx.doi.org/10.1186/s13321-022-00613-8 |
work_keys_str_mv | AT garciaconstantinoa probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT gildelafuentealberto probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT barbascoral probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections AT oteroabraham probabilisticmetaboliteannotationusingretentiontimepredictionandmetalearnedprojections |