Cargando…

Review of techniques and models used in optical chemical structure recognition in images and scanned documents

Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which mi...

Descripción completa

Detalles Bibliográficos
Autores principales: Musazade, Fidan, Jamalova, Narmin, Hasanov, Jamaladdin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9461257/
https://www.ncbi.nlm.nih.gov/pubmed/36076301
http://dx.doi.org/10.1186/s13321-022-00642-3
_version_ 1784786936957763584
author Musazade, Fidan
Jamalova, Narmin
Hasanov, Jamaladdin
author_facet Musazade, Fidan
Jamalova, Narmin
Hasanov, Jamaladdin
author_sort Musazade, Fidan
collection PubMed
description Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development.
format Online
Article
Text
id pubmed-9461257
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-94612572022-09-10 Review of techniques and models used in optical chemical structure recognition in images and scanned documents Musazade, Fidan Jamalova, Narmin Hasanov, Jamaladdin J Cheminform Review Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development. Springer International Publishing 2022-09-09 /pmc/articles/PMC9461257/ /pubmed/36076301 http://dx.doi.org/10.1186/s13321-022-00642-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Review
Musazade, Fidan
Jamalova, Narmin
Hasanov, Jamaladdin
Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title_full Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title_fullStr Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title_full_unstemmed Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title_short Review of techniques and models used in optical chemical structure recognition in images and scanned documents
title_sort review of techniques and models used in optical chemical structure recognition in images and scanned documents
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9461257/
https://www.ncbi.nlm.nih.gov/pubmed/36076301
http://dx.doi.org/10.1186/s13321-022-00642-3
work_keys_str_mv AT musazadefidan reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments
AT jamalovanarmin reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments
AT hasanovjamaladdin reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments