Cargando…

MolMiner: You Only Look Once for Chemical Structure Recognition

[Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Youjun, Xiao, Jinchuan, Chou, Chia-Han, Zhang, Jianhang, Zhu, Jintao, Hu, Qiwan, Li, Hemin, Han, Ningsheng, Liu, Bingyu, Zhang, Shuaipeng, Han, Jinyu, Zhang, Zhen, Zhang, Shuhao, Zhang, Weilin, Lai, Luhua, Pei, Jianfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710516/
https://www.ncbi.nlm.nih.gov/pubmed/36108142
http://dx.doi.org/10.1021/acs.jcim.2c00733
_version_ 1784841383242104832
author Xu, Youjun
Xiao, Jinchuan
Chou, Chia-Han
Zhang, Jianhang
Zhu, Jintao
Hu, Qiwan
Li, Hemin
Han, Ningsheng
Liu, Bingyu
Zhang, Shuaipeng
Han, Jinyu
Zhang, Zhen
Zhang, Shuhao
Zhang, Weilin
Lai, Luhua
Pei, Jianfeng
author_facet Xu, Youjun
Xiao, Jinchuan
Chou, Chia-Han
Zhang, Jianhang
Zhu, Jintao
Hu, Qiwan
Li, Hemin
Han, Ningsheng
Liu, Bingyu
Zhang, Shuaipeng
Han, Jinyu
Zhang, Zhen
Zhang, Shuhao
Zhang, Weilin
Lai, Luhua
Pei, Jianfeng
author_sort Xu, Youjun
collection PubMed
description [Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for translating printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades use a rule-based approach, which vectorizes the depiction based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software called MolMiner, which is primarily built using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with a distance-based construction algorithm. MolMiner gave state-of-the-art performance on four benchmark data sets and a self-collected external data set from scientific papers. As MolMiner performed similarly well in real-world OCSR tasks with a user-friendly interface, it is a useful and valuable tool for daily applications. The free download links of Mac and Windows versions are available at https://github.com/iipharma/pharmamind-molminer.
format Online
Article
Text
id pubmed-9710516
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-97105162023-09-15 MolMiner: You Only Look Once for Chemical Structure Recognition Xu, Youjun Xiao, Jinchuan Chou, Chia-Han Zhang, Jianhang Zhu, Jintao Hu, Qiwan Li, Hemin Han, Ningsheng Liu, Bingyu Zhang, Shuaipeng Han, Jinyu Zhang, Zhen Zhang, Shuhao Zhang, Weilin Lai, Luhua Pei, Jianfeng J Chem Inf Model [Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for translating printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades use a rule-based approach, which vectorizes the depiction based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software called MolMiner, which is primarily built using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with a distance-based construction algorithm. MolMiner gave state-of-the-art performance on four benchmark data sets and a self-collected external data set from scientific papers. As MolMiner performed similarly well in real-world OCSR tasks with a user-friendly interface, it is a useful and valuable tool for daily applications. The free download links of Mac and Windows versions are available at https://github.com/iipharma/pharmamind-molminer. American Chemical Society 2022-09-15 2022-11-28 /pmc/articles/PMC9710516/ /pubmed/36108142 http://dx.doi.org/10.1021/acs.jcim.2c00733 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Xu, Youjun
Xiao, Jinchuan
Chou, Chia-Han
Zhang, Jianhang
Zhu, Jintao
Hu, Qiwan
Li, Hemin
Han, Ningsheng
Liu, Bingyu
Zhang, Shuaipeng
Han, Jinyu
Zhang, Zhen
Zhang, Shuhao
Zhang, Weilin
Lai, Luhua
Pei, Jianfeng
MolMiner: You Only Look Once for Chemical Structure Recognition
title MolMiner: You Only Look Once for Chemical Structure Recognition
title_full MolMiner: You Only Look Once for Chemical Structure Recognition
title_fullStr MolMiner: You Only Look Once for Chemical Structure Recognition
title_full_unstemmed MolMiner: You Only Look Once for Chemical Structure Recognition
title_short MolMiner: You Only Look Once for Chemical Structure Recognition
title_sort molminer: you only look once for chemical structure recognition
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710516/
https://www.ncbi.nlm.nih.gov/pubmed/36108142
http://dx.doi.org/10.1021/acs.jcim.2c00733
work_keys_str_mv AT xuyoujun molmineryouonlylookonceforchemicalstructurerecognition
AT xiaojinchuan molmineryouonlylookonceforchemicalstructurerecognition
AT chouchiahan molmineryouonlylookonceforchemicalstructurerecognition
AT zhangjianhang molmineryouonlylookonceforchemicalstructurerecognition
AT zhujintao molmineryouonlylookonceforchemicalstructurerecognition
AT huqiwan molmineryouonlylookonceforchemicalstructurerecognition
AT lihemin molmineryouonlylookonceforchemicalstructurerecognition
AT hanningsheng molmineryouonlylookonceforchemicalstructurerecognition
AT liubingyu molmineryouonlylookonceforchemicalstructurerecognition
AT zhangshuaipeng molmineryouonlylookonceforchemicalstructurerecognition
AT hanjinyu molmineryouonlylookonceforchemicalstructurerecognition
AT zhangzhen molmineryouonlylookonceforchemicalstructurerecognition
AT zhangshuhao molmineryouonlylookonceforchemicalstructurerecognition
AT zhangweilin molmineryouonlylookonceforchemicalstructurerecognition
AT lailuhua molmineryouonlylookonceforchemicalstructurerecognition
AT peijianfeng molmineryouonlylookonceforchemicalstructurerecognition