Cargando…

ChemicalTagger: A tool for semantic text-mining in chemistry

BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured l...

Descripción completa

Detalles Bibliográficos
Autores principales: Hawizy, Lezan, Jessop, David M, Adams, Nico, Murray-Rust, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117806/
https://www.ncbi.nlm.nih.gov/pubmed/21575201
http://dx.doi.org/10.1186/1758-2946-3-17
_version_ 1782206378465558528
author Hawizy, Lezan
Jessop, David M
Adams, Nico
Murray-Rust, Peter
author_facet Hawizy, Lezan
Jessop, David M
Adams, Nico
Murray-Rust, Peter
author_sort Hawizy, Lezan
collection PubMed
description BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.
format Online
Article
Text
id pubmed-3117806
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31178062011-06-18 ChemicalTagger: A tool for semantic text-mining in chemistry Hawizy, Lezan Jessop, David M Adams, Nico Murray-Rust, Peter J Cheminform Research Article BACKGROUND: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. RESULTS: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). CONCLUSIONS: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision. BioMed Central 2011-05-16 /pmc/articles/PMC3117806/ /pubmed/21575201 http://dx.doi.org/10.1186/1758-2946-3-17 Text en Copyright ©2011 Hawizy et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hawizy, Lezan
Jessop, David M
Adams, Nico
Murray-Rust, Peter
ChemicalTagger: A tool for semantic text-mining in chemistry
title ChemicalTagger: A tool for semantic text-mining in chemistry
title_full ChemicalTagger: A tool for semantic text-mining in chemistry
title_fullStr ChemicalTagger: A tool for semantic text-mining in chemistry
title_full_unstemmed ChemicalTagger: A tool for semantic text-mining in chemistry
title_short ChemicalTagger: A tool for semantic text-mining in chemistry
title_sort chemicaltagger: a tool for semantic text-mining in chemistry
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117806/
https://www.ncbi.nlm.nih.gov/pubmed/21575201
http://dx.doi.org/10.1186/1758-2946-3-17
work_keys_str_mv AT hawizylezan chemicaltaggeratoolforsemantictextmininginchemistry
AT jessopdavidm chemicaltaggeratoolforsemantictextmininginchemistry
AT adamsnico chemicaltaggeratoolforsemantictextmininginchemistry
AT murrayrustpeter chemicaltaggeratoolforsemantictextmininginchemistry