Cargando…
Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data
[Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369484/ https://www.ncbi.nlm.nih.gov/pubmed/37405398 http://dx.doi.org/10.1021/acs.jcim.3c00607 |
_version_ | 1785077770334765056 |
---|---|
author | Mercado, Rocío Kearnes, Steven M. Coley, Connor W. |
author_facet | Mercado, Rocío Kearnes, Steven M. Coley, Connor W. |
author_sort | Mercado, Rocío |
collection | PubMed |
description | [Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers. |
format | Online Article Text |
id | pubmed-10369484 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-103694842023-07-27 Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data Mercado, Rocío Kearnes, Steven M. Coley, Connor W. J Chem Inf Model [Image: see text] The past decade has seen a number of impressive developments in predictive chemistry and reaction informatics driven by machine learning applications to computer-aided synthesis planning. While many of these developments have been made even with relatively small, bespoke data sets, in order to advance the role of AI in the field at scale, there must be significant improvements in the reporting of reaction data. Currently, the majority of publicly available data is reported in an unstructured format and heavily imbalanced toward high-yielding reactions, which influences the types of models that can be successfully trained. In this Perspective, we analyze several data curation and sharing initiatives that have seen success in chemistry and molecular biology. We discuss several factors that have contributed to their success and how we can take lessons from these case studies and apply them to reaction data. Finally, we spotlight the Open Reaction Database and summarize key actions the community can take toward making reaction data more findable, accessible, interoperable, and reusable (FAIR), including the use of mandates from funding agencies and publishers. American Chemical Society 2023-07-05 /pmc/articles/PMC10369484/ /pubmed/37405398 http://dx.doi.org/10.1021/acs.jcim.3c00607 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Mercado, Rocío Kearnes, Steven M. Coley, Connor W. Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data |
title | Data Sharing in
Chemistry: Lessons Learned and a Case
for Mandating Structured Reaction Data |
title_full | Data Sharing in
Chemistry: Lessons Learned and a Case
for Mandating Structured Reaction Data |
title_fullStr | Data Sharing in
Chemistry: Lessons Learned and a Case
for Mandating Structured Reaction Data |
title_full_unstemmed | Data Sharing in
Chemistry: Lessons Learned and a Case
for Mandating Structured Reaction Data |
title_short | Data Sharing in
Chemistry: Lessons Learned and a Case
for Mandating Structured Reaction Data |
title_sort | data sharing in
chemistry: lessons learned and a case
for mandating structured reaction data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369484/ https://www.ncbi.nlm.nih.gov/pubmed/37405398 http://dx.doi.org/10.1021/acs.jcim.3c00607 |
work_keys_str_mv | AT mercadorocio datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata AT kearnesstevenm datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata AT coleyconnorw datasharinginchemistrylessonslearnedandacaseformandatingstructuredreactiondata |