Cargando…

The METLIN small molecule dataset for machine learning-based retention time prediction

Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due t...

Descripción completa

Detalles Bibliográficos
Autores principales: Domingo-Almenara, Xavier, Guijas, Carlos, Billings, Elizabeth, Montenegro-Burke, J. Rafael, Uritboonthai, Winnie, Aisporna, Aries E., Chen, Emily, Benton, H. Paul, Siuzdak, Gary
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6925099/
https://www.ncbi.nlm.nih.gov/pubmed/31862874
http://dx.doi.org/10.1038/s41467-019-13680-7
_version_ 1783481844234190848
author Domingo-Almenara, Xavier
Guijas, Carlos
Billings, Elizabeth
Montenegro-Burke, J. Rafael
Uritboonthai, Winnie
Aisporna, Aries E.
Chen, Emily
Benton, H. Paul
Siuzdak, Gary
author_facet Domingo-Almenara, Xavier
Guijas, Carlos
Billings, Elizabeth
Montenegro-Burke, J. Rafael
Uritboonthai, Winnie
Aisporna, Aries E.
Chen, Emily
Benton, H. Paul
Siuzdak, Gary
author_sort Domingo-Almenara, Xavier
collection PubMed
description Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70[Formula: see text] of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
format Online
Article
Text
id pubmed-6925099
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-69250992019-12-22 The METLIN small molecule dataset for machine learning-based retention time prediction Domingo-Almenara, Xavier Guijas, Carlos Billings, Elizabeth Montenegro-Burke, J. Rafael Uritboonthai, Winnie Aisporna, Aries E. Chen, Emily Benton, H. Paul Siuzdak, Gary Nat Commun Article Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70[Formula: see text] of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction. Nature Publishing Group UK 2019-12-20 /pmc/articles/PMC6925099/ /pubmed/31862874 http://dx.doi.org/10.1038/s41467-019-13680-7 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Domingo-Almenara, Xavier
Guijas, Carlos
Billings, Elizabeth
Montenegro-Burke, J. Rafael
Uritboonthai, Winnie
Aisporna, Aries E.
Chen, Emily
Benton, H. Paul
Siuzdak, Gary
The METLIN small molecule dataset for machine learning-based retention time prediction
title The METLIN small molecule dataset for machine learning-based retention time prediction
title_full The METLIN small molecule dataset for machine learning-based retention time prediction
title_fullStr The METLIN small molecule dataset for machine learning-based retention time prediction
title_full_unstemmed The METLIN small molecule dataset for machine learning-based retention time prediction
title_short The METLIN small molecule dataset for machine learning-based retention time prediction
title_sort metlin small molecule dataset for machine learning-based retention time prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6925099/
https://www.ncbi.nlm.nih.gov/pubmed/31862874
http://dx.doi.org/10.1038/s41467-019-13680-7
work_keys_str_mv AT domingoalmenaraxavier themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT guijascarlos themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT billingselizabeth themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT montenegroburkejrafael themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT uritboonthaiwinnie themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT aispornaariese themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT chenemily themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT bentonhpaul themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT siuzdakgary themetlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT domingoalmenaraxavier metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT guijascarlos metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT billingselizabeth metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT montenegroburkejrafael metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT uritboonthaiwinnie metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT aispornaariese metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT chenemily metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT bentonhpaul metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction
AT siuzdakgary metlinsmallmoleculedatasetformachinelearningbasedretentiontimeprediction