Cargando…

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data

[Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mercado, Rocío, Kearnes, Steven M., Coley, Connor W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369484/
https://www.ncbi.nlm.nih.gov/pubmed/37405398
http://dx.doi.org/10.1021/acs.jcim.3c00607
_version_ 1785077770334765056
author Mercado, Rocío
Kearnes, Steven M.
Coley, Connor W.
author_facet Mercado, Rocío
Kearnes, Steven M.
Coley, Connor W.
author_sort Mercado, Rocío
collection PubMed
description [Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers.
format Online
Article
Text
id pubmed-10369484
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-103694842023-07-27 Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data Mercado, Rocío Kearnes, Steven M. Coley, Connor W. J Chem Inf Model [Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers. American Chemical Society 2023-07-05 /pmc/articles/PMC10369484/ /pubmed/37405398 http://dx.doi.org/10.1021/acs.jcim.3c00607 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Mercado, Rocío
Kearnes, Steven M.
Coley, Connor W.
Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title_full Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title_fullStr Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title_full_unstemmed Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title_short Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
title_sort data sharing in chemistry: lessons learned and a case for mandating structured reaction data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369484/
https://www.ncbi.nlm.nih.gov/pubmed/37405398
http://dx.doi.org/10.1021/acs.jcim.3c00607
work_keys_str_mv AT mercadorocio datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata
AT kearnesstevenm datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata
AT coleyconnorw datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata