Cargando…

Automated extraction of chemical structure information from digital raster images

BACKGROUND: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Jungkap, Rosania, Gus R, Shedden, Kerby A, Nguyen, Mandee, Lyu, Naesung, Saitou, Kazuhiro
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648963/
https://www.ncbi.nlm.nih.gov/pubmed/19196483
http://dx.doi.org/10.1186/1752-153X-3-4
_version_ 1782165002116923392
author Park, Jungkap
Rosania, Gus R
Shedden, Kerby A
Nguyen, Mandee
Lyu, Naesung
Saitou, Kazuhiro
author_facet Park, Jungkap
Rosania, Gus R
Shedden, Kerby A
Nguyen, Mandee
Lyu, Naesung
Saitou, Kazuhiro
author_sort Park, Jungkap
collection PubMed
description BACKGROUND: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. RESULTS: This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. CONCLUSION: The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.
format Text
id pubmed-2648963
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26489632009-03-03 Automated extraction of chemical structure information from digital raster images Park, Jungkap Rosania, Gus R Shedden, Kerby A Nguyen, Mandee Lyu, Naesung Saitou, Kazuhiro Chem Cent J Methodology BACKGROUND: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. RESULTS: This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. CONCLUSION: The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles. BioMed Central 2009-02-05 /pmc/articles/PMC2648963/ /pubmed/19196483 http://dx.doi.org/10.1186/1752-153X-3-4 Text en Copyright © 2009 Park et al
spellingShingle Methodology
Park, Jungkap
Rosania, Gus R
Shedden, Kerby A
Nguyen, Mandee
Lyu, Naesung
Saitou, Kazuhiro
Automated extraction of chemical structure information from digital raster images
title Automated extraction of chemical structure information from digital raster images
title_full Automated extraction of chemical structure information from digital raster images
title_fullStr Automated extraction of chemical structure information from digital raster images
title_full_unstemmed Automated extraction of chemical structure information from digital raster images
title_short Automated extraction of chemical structure information from digital raster images
title_sort automated extraction of chemical structure information from digital raster images
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648963/
https://www.ncbi.nlm.nih.gov/pubmed/19196483
http://dx.doi.org/10.1186/1752-153X-3-4
work_keys_str_mv AT parkjungkap automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT rosaniagusr automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT sheddenkerbya automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT nguyenmandee automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT lyunaesung automatedextractionofchemicalstructureinformationfromdigitalrasterimages
AT saitoukazuhiro automatedextractionofchemicalstructureinformationfromdigitalrasterimages