Cargando…
Review of techniques and models used in optical chemical structure recognition in images and scanned documents
Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which mi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9461257/ https://www.ncbi.nlm.nih.gov/pubmed/36076301 http://dx.doi.org/10.1186/s13321-022-00642-3 |
_version_ | 1784786936957763584 |
---|---|
author | Musazade, Fidan Jamalova, Narmin Hasanov, Jamaladdin |
author_facet | Musazade, Fidan Jamalova, Narmin Hasanov, Jamaladdin |
author_sort | Musazade, Fidan |
collection | PubMed |
description | Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development. |
format | Online Article Text |
id | pubmed-9461257 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-94612572022-09-10 Review of techniques and models used in optical chemical structure recognition in images and scanned documents Musazade, Fidan Jamalova, Narmin Hasanov, Jamaladdin J Cheminform Review Extraction of chemical formulas from images was not in the top priority of Computer Vision tasks for a while. The complexity both on the input and prediction sides has made this task challenging for the conventional Artificial Intelligence and Machine Learning problems. A binary input image which might seem trivial for convolutional analysis was not easy to classify, since the provided sample was not representative of the given molecule: to describe the same formula, a variety of graphical representations which do not resemble each other can be used. Considering the variety of molecules, the problem shifted from classification to that of formula generation, which makes Natural Language Processing (NLP) a good candidate for an effective solution. This paper describes the evolution of approaches from rule-based structure analyses to complex statistical models, and compares the efficiency of models and methodologies used in the recent years. Although the latest achievements deliver ideal results on particular datasets, the authors mention possible problems for various scenarios and provide suggestions for further development. Springer International Publishing 2022-09-09 /pmc/articles/PMC9461257/ /pubmed/36076301 http://dx.doi.org/10.1186/s13321-022-00642-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Review Musazade, Fidan Jamalova, Narmin Hasanov, Jamaladdin Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title | Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title_full | Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title_fullStr | Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title_full_unstemmed | Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title_short | Review of techniques and models used in optical chemical structure recognition in images and scanned documents |
title_sort | review of techniques and models used in optical chemical structure recognition in images and scanned documents |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9461257/ https://www.ncbi.nlm.nih.gov/pubmed/36076301 http://dx.doi.org/10.1186/s13321-022-00642-3 |
work_keys_str_mv | AT musazadefidan reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments AT jamalovanarmin reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments AT hasanovjamaladdin reviewoftechniquesandmodelsusedinopticalchemicalstructurerecognitioninimagesandscanneddocuments |