Cargando…

ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes

[Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood b...

Descripción completa

Detalles Bibliográficos
Autores principales: Wilary, Damian M., Cole, Jacqueline M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565829/
https://www.ncbi.nlm.nih.gov/pubmed/37729111
http://dx.doi.org/10.1021/acs.jcim.3c00422
_version_ 1785118780811116544
author Wilary, Damian M.
Cole, Jacqueline M.
author_facet Wilary, Damian M.
Cole, Jacqueline M.
author_sort Wilary, Damian M.
collection PubMed
description [Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction.
format Online
Article
Text
id pubmed-10565829
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-105658292023-10-12 ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes Wilary, Damian M. Cole, Jacqueline M. J Chem Inf Model [Image: see text] Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction. American Chemical Society 2023-09-20 /pmc/articles/PMC10565829/ /pubmed/37729111 http://dx.doi.org/10.1021/acs.jcim.3c00422 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Wilary, Damian M.
Cole, Jacqueline M.
ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title_full ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title_fullStr ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title_full_unstemmed ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title_short ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes
title_sort reactiondataextractor 2.0: a deep learning approach for data extraction from chemical reaction schemes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565829/
https://www.ncbi.nlm.nih.gov/pubmed/37729111
http://dx.doi.org/10.1021/acs.jcim.3c00422
work_keys_str_mv AT wilarydamianm reactiondataextractor20adeeplearningapproachfordataextractionfromchemicalreactionschemes
AT colejacquelinem reactiondataextractor20adeeplearningapproachfordataextractionfromchemicalreactionschemes