Cargando…
ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
[Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood b...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565829/ https://www.ncbi.nlm.nih.gov/pubmed/37729111 http://dx.doi.org/10.1021/acs.jcim.3c00422 |
_version_ | 1785118780811116544 |
---|---|
author | Wilary, Damian M. Cole, Jacqueline M. |
author_facet | Wilary, Damian M. Cole, Jacqueline M. |
author_sort | Wilary, Damian M. |
collection | PubMed |
description | [Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction. |
format | Online Article Text |
id | pubmed-10565829 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-105658292023-10-12 ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes Wilary, Damian M. Cole, Jacqueline M. J Chem Inf Model [Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction. American Chemical Society 2023-09-20 /pmc/articles/PMC10565829/ /pubmed/37729111 http://dx.doi.org/10.1021/acs.jcim.3c00422 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Wilary, Damian M. Cole, Jacqueline M. ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes |
title | ReactionDataExtractor
2.0: A Deep Learning Approach
for Data Extraction from Chemical Reaction Schemes |
title_full | ReactionDataExtractor
2.0: A Deep Learning Approach
for Data Extraction from Chemical Reaction Schemes |
title_fullStr | ReactionDataExtractor
2.0: A Deep Learning Approach
for Data Extraction from Chemical Reaction Schemes |
title_full_unstemmed | ReactionDataExtractor
2.0: A Deep Learning Approach
for Data Extraction from Chemical Reaction Schemes |
title_short | ReactionDataExtractor
2.0: A Deep Learning Approach
for Data Extraction from Chemical Reaction Schemes |
title_sort | reactiondataextractor
2.0: a deep learning approach
for data extraction from chemical reaction schemes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565829/ https://www.ncbi.nlm.nih.gov/pubmed/37729111 http://dx.doi.org/10.1021/acs.jcim.3c00422 |
work_keys_str_mv | AT wilarydamianm reactiondataextractor20adeeplearningapproachfordataextractionfromchemicalreactionschemes AT colejacquelinem reactiondataextractor20adeeplearningapproachfordataextractionfromchemicalreactionschemes |