Cargando…
MolMiner: You Only Look Once for Chemical Structure Recognition
[Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710516/ https://www.ncbi.nlm.nih.gov/pubmed/36108142 http://dx.doi.org/10.1021/acs.jcim.2c00733 |
_version_ | 1784841383242104832 |
---|---|
author | Xu, Youjun Xiao, Jinchuan Chou, Chia-Han Zhang, Jianhang Zhu, Jintao Hu, Qiwan Li, Hemin Han, Ningsheng Liu, Bingyu Zhang, Shuaipeng Han, Jinyu Zhang, Zhen Zhang, Shuhao Zhang, Weilin Lai, Luhua Pei, Jianfeng |
author_facet | Xu, Youjun Xiao, Jinchuan Chou, Chia-Han Zhang, Jianhang Zhu, Jintao Hu, Qiwan Li, Hemin Han, Ningsheng Liu, Bingyu Zhang, Shuaipeng Han, Jinyu Zhang, Zhen Zhang, Shuhao Zhang, Weilin Lai, Luhua Pei, Jianfeng |
author_sort | Xu, Youjun |
collection | PubMed |
description | [Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for translating printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades use a rule-based approach, which vectorizes the depiction based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software called MolMiner, which is primarily built using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with a distance-based construction algorithm. MolMiner gave state-of-the-art performance on four benchmark data sets and a self-collected external data set from scientific papers. As MolMiner performed similarly well in real-world OCSR tasks with a user-friendly interface, it is a useful and valuable tool for daily applications. The free download links of Mac and Windows versions are available at https://github.com/iipharma/pharmamind-molminer. |
format | Online Article Text |
id | pubmed-9710516 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-97105162023-09-15 MolMiner: You Only Look Once for Chemical Structure Recognition Xu, Youjun Xiao, Jinchuan Chou, Chia-Han Zhang, Jianhang Zhu, Jintao Hu, Qiwan Li, Hemin Han, Ningsheng Liu, Bingyu Zhang, Shuaipeng Han, Jinyu Zhang, Zhen Zhang, Shuhao Zhang, Weilin Lai, Luhua Pei, Jianfeng J Chem Inf Model [Image: see text] Molecular structures are commonly depicted in 2D printed forms in scientific documents such as journal papers and patents. However, these 2D depictions are not machine readable. Due to a backlog of decades and an increasing amount of printed literatures, there is a high demand for translating printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades use a rule-based approach, which vectorizes the depiction based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software called MolMiner, which is primarily built using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with a distance-based construction algorithm. MolMiner gave state-of-the-art performance on four benchmark data sets and a self-collected external data set from scientific papers. As MolMiner performed similarly well in real-world OCSR tasks with a user-friendly interface, it is a useful and valuable tool for daily applications. The free download links of Mac and Windows versions are available at https://github.com/iipharma/pharmamind-molminer. American Chemical Society 2022-09-15 2022-11-28 /pmc/articles/PMC9710516/ /pubmed/36108142 http://dx.doi.org/10.1021/acs.jcim.2c00733 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Xu, Youjun Xiao, Jinchuan Chou, Chia-Han Zhang, Jianhang Zhu, Jintao Hu, Qiwan Li, Hemin Han, Ningsheng Liu, Bingyu Zhang, Shuaipeng Han, Jinyu Zhang, Zhen Zhang, Shuhao Zhang, Weilin Lai, Luhua Pei, Jianfeng MolMiner: You Only Look Once for Chemical Structure Recognition |
title | MolMiner: You Only
Look Once for Chemical Structure
Recognition |
title_full | MolMiner: You Only
Look Once for Chemical Structure
Recognition |
title_fullStr | MolMiner: You Only
Look Once for Chemical Structure
Recognition |
title_full_unstemmed | MolMiner: You Only
Look Once for Chemical Structure
Recognition |
title_short | MolMiner: You Only
Look Once for Chemical Structure
Recognition |
title_sort | molminer: you only
look once for chemical structure
recognition |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710516/ https://www.ncbi.nlm.nih.gov/pubmed/36108142 http://dx.doi.org/10.1021/acs.jcim.2c00733 |
work_keys_str_mv | AT xuyoujun molmineryouonlylookonceforchemicalstructurerecognition AT xiaojinchuan molmineryouonlylookonceforchemicalstructurerecognition AT chouchiahan molmineryouonlylookonceforchemicalstructurerecognition AT zhangjianhang molmineryouonlylookonceforchemicalstructurerecognition AT zhujintao molmineryouonlylookonceforchemicalstructurerecognition AT huqiwan molmineryouonlylookonceforchemicalstructurerecognition AT lihemin molmineryouonlylookonceforchemicalstructurerecognition AT hanningsheng molmineryouonlylookonceforchemicalstructurerecognition AT liubingyu molmineryouonlylookonceforchemicalstructurerecognition AT zhangshuaipeng molmineryouonlylookonceforchemicalstructurerecognition AT hanjinyu molmineryouonlylookonceforchemicalstructurerecognition AT zhangzhen molmineryouonlylookonceforchemicalstructurerecognition AT zhangshuhao molmineryouonlylookonceforchemicalstructurerecognition AT zhangweilin molmineryouonlylookonceforchemicalstructurerecognition AT lailuhua molmineryouonlylookonceforchemicalstructurerecognition AT peijianfeng molmineryouonlylookonceforchemicalstructurerecognition |