Cargando…
ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition too...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society of Chemistry
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365825/ https://www.ncbi.nlm.nih.gov/pubmed/34447555 http://dx.doi.org/10.1039/d1sc02957f |
_version_ | 1783738786268577792 |
---|---|
author | Weir, Hayley Thompson, Keiran Woodward, Amelia Choi, Benjamin Braun, Augustin Martínez, Todd J. |
author_facet | Weir, Hayley Thompson, Keiran Woodward, Amelia Choi, Benjamin Braun, Augustin Martínez, Todd J. |
author_sort | Weir, Hayley |
collection | PubMed |
description | Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered. |
format | Online Article Text |
id | pubmed-8365825 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | The Royal Society of Chemistry |
record_format | MEDLINE/PubMed |
spelling | pubmed-83658252021-08-25 ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning Weir, Hayley Thompson, Keiran Woodward, Amelia Choi, Benjamin Braun, Augustin Martínez, Todd J. Chem Sci Chemistry Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered. The Royal Society of Chemistry 2021-07-03 /pmc/articles/PMC8365825/ /pubmed/34447555 http://dx.doi.org/10.1039/d1sc02957f Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/ |
spellingShingle | Chemistry Weir, Hayley Thompson, Keiran Woodward, Amelia Choi, Benjamin Braun, Augustin Martínez, Todd J. ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title | ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title_full | ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title_fullStr | ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title_full_unstemmed | ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title_short | ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
title_sort | chempix: automated recognition of hand-drawn hydrocarbon structures using deep learning |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8365825/ https://www.ncbi.nlm.nih.gov/pubmed/34447555 http://dx.doi.org/10.1039/d1sc02957f |
work_keys_str_mv | AT weirhayley chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning AT thompsonkeiran chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning AT woodwardamelia chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning AT choibenjamin chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning AT braunaugustin chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning AT martineztoddj chempixautomatedrecognitionofhanddrawnhydrocarbonstructuresusingdeeplearning |