Cargando…

ChemEx: information extraction system for chemical data curation

BACKGROUND: Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs...

Descripción completa

Detalles Bibliográficos
Autores principales: Tharatipyakul, Atima, Numnark, Somrak, Wichadakul, Duangdao, Ingsriswang, Supawadee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521388/
https://www.ncbi.nlm.nih.gov/pubmed/23282330
http://dx.doi.org/10.1186/1471-2105-13-S17-S9
_version_ 1782252945855741952
author Tharatipyakul, Atima
Numnark, Somrak
Wichadakul, Duangdao
Ingsriswang, Supawadee
author_facet Tharatipyakul, Atima
Numnark, Somrak
Wichadakul, Duangdao
Ingsriswang, Supawadee
author_sort Tharatipyakul, Atima
collection PubMed
description BACKGROUND: Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together. RESULTS: We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests. CONCLUSIONS: ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx.
format Online
Article
Text
id pubmed-3521388
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35213882012-12-14 ChemEx: information extraction system for chemical data curation Tharatipyakul, Atima Numnark, Somrak Wichadakul, Duangdao Ingsriswang, Supawadee BMC Bioinformatics Proceedings BACKGROUND: Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together. RESULTS: We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests. CONCLUSIONS: ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx. BioMed Central 2012-12-07 /pmc/articles/PMC3521388/ /pubmed/23282330 http://dx.doi.org/10.1186/1471-2105-13-S17-S9 Text en Copyright ©2012 Tharatipyakul et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Tharatipyakul, Atima
Numnark, Somrak
Wichadakul, Duangdao
Ingsriswang, Supawadee
ChemEx: information extraction system for chemical data curation
title ChemEx: information extraction system for chemical data curation
title_full ChemEx: information extraction system for chemical data curation
title_fullStr ChemEx: information extraction system for chemical data curation
title_full_unstemmed ChemEx: information extraction system for chemical data curation
title_short ChemEx: information extraction system for chemical data curation
title_sort chemex: information extraction system for chemical data curation
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521388/
https://www.ncbi.nlm.nih.gov/pubmed/23282330
http://dx.doi.org/10.1186/1471-2105-13-S17-S9
work_keys_str_mv AT tharatipyakulatima chemexinformationextractionsystemforchemicaldatacuration
AT numnarksomrak chemexinformationextractionsystemforchemicaldatacuration
AT wichadakulduangdao chemexinformationextractionsystemforchemicaldatacuration
AT ingsriswangsupawadee chemexinformationextractionsystemforchemicaldatacuration