Cargando…
YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications
In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662772/ https://www.ncbi.nlm.nih.gov/pubmed/37986007 http://dx.doi.org/10.1186/s13321-023-00783-z |
_version_ | 1785148601520881664 |
---|---|
author | Zhou, Chong Liu, Wei Song, Xiyue Yang, Mengling Peng, Xiaowang |
author_facet | Zhou, Chong Liu, Wei Song, Xiyue Yang, Mengling Peng, Xiaowang |
author_sort | Zhou, Chong |
collection | PubMed |
description | In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage, paving the way for further analyses and studies on chemical information. A pivotal initial step in OCSR is automating the noise-free extraction of molecular descriptions from literature. Despite efforts utilising rule-based and deep learning approaches for the extraction process, the accuracy achieved to date is unsatisfactory. To address this issue, we introduce a deep learning model named YoDe-Segmentation in this study, engineered for the automated retrieval of molecular structures from scientific documents. This model operates via a three-stage process encompassing detection, mask generation, and calculation. Initially, it identifies and isolates molecular structures during the detection phase. Subsequently, mask maps are created based on these isolated structures in the mask generation stage. In the final calculation stage, refined and separated mask maps are combined with the isolated molecular structure images, resulting in the acquisition of pure molecular structures. Our model underwent rigorous testing using texts from multiple chemistry-centric journals, with the outcomes subjected to manual validation. The results revealed the superior performance of YoDe-Segmentation compared to alternative algorithms, documenting an average extraction efficiency of 97.62%. This outcome not only highlights the robustness and reliability of the model but also suggests its applicability on a broad scale. |
format | Online Article Text |
id | pubmed-10662772 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-106627722023-11-20 YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications Zhou, Chong Liu, Wei Song, Xiyue Yang, Mengling Peng, Xiaowang J Cheminform Research In chemistry-related disciplines, a vast repository of molecular structural data has been documented in scientific publications but remains inaccessible to computational analyses owing to its non-machine-readable format. Optical chemical structure recognition (OCSR) addresses this gap by converting images of chemical molecular structures into a format accessible to computers and convenient for storage, paving the way for further analyses and studies on chemical information. A pivotal initial step in OCSR is automating the noise-free extraction of molecular descriptions from literature. Despite efforts utilising rule-based and deep learning approaches for the extraction process, the accuracy achieved to date is unsatisfactory. To address this issue, we introduce a deep learning model named YoDe-Segmentation in this study, engineered for the automated retrieval of molecular structures from scientific documents. This model operates via a three-stage process encompassing detection, mask generation, and calculation. Initially, it identifies and isolates molecular structures during the detection phase. Subsequently, mask maps are created based on these isolated structures in the mask generation stage. In the final calculation stage, refined and separated mask maps are combined with the isolated molecular structure images, resulting in the acquisition of pure molecular structures. Our model underwent rigorous testing using texts from multiple chemistry-centric journals, with the outcomes subjected to manual validation. The results revealed the superior performance of YoDe-Segmentation compared to alternative algorithms, documenting an average extraction efficiency of 97.62%. This outcome not only highlights the robustness and reliability of the model but also suggests its applicability on a broad scale. Springer International Publishing 2023-11-20 /pmc/articles/PMC10662772/ /pubmed/37986007 http://dx.doi.org/10.1186/s13321-023-00783-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Zhou, Chong Liu, Wei Song, Xiyue Yang, Mengling Peng, Xiaowang YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title | YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title_full | YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title_fullStr | YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title_full_unstemmed | YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title_short | YoDe-Segmentation: automated noise-free retrieval of molecular structures from scientific publications |
title_sort | yode-segmentation: automated noise-free retrieval of molecular structures from scientific publications |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662772/ https://www.ncbi.nlm.nih.gov/pubmed/37986007 http://dx.doi.org/10.1186/s13321-023-00783-z |
work_keys_str_mv | AT zhouchong yodesegmentationautomatednoisefreeretrievalofmolecularstructuresfromscientificpublications AT liuwei yodesegmentationautomatednoisefreeretrievalofmolecularstructuresfromscientificpublications AT songxiyue yodesegmentationautomatednoisefreeretrievalofmolecularstructuresfromscientificpublications AT yangmengling yodesegmentationautomatednoisefreeretrievalofmolecularstructuresfromscientificpublications AT pengxiaowang yodesegmentationautomatednoisefreeretrievalofmolecularstructuresfromscientificpublications |