Cargando…

A scalable machine-learning approach to recognize chemical names within large text databases

MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, repor...

Descripción completa

Detalles Bibliográficos
Autor principal:	Wren, Jonathan D
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683569/ https://www.ncbi.nlm.nih.gov/pubmed/17118146 http://dx.doi.org/10.1186/1471-2105-7-S2-S3

_version_	1782131171595911168
author	Wren, Jonathan D
author_facet	Wren, Jonathan D
author_sort	Wren, Jonathan D
collection	PubMed
description	MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. RESULTS: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid.
format	Text
id	pubmed-1683569
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16835692006-12-05 A scalable machine-learning approach to recognize chemical names within large text databases Wren, Jonathan D BMC Bioinformatics Proceedings MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. RESULTS: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid. BioMed Central 2006-09-26 /pmc/articles/PMC1683569/ /pubmed/17118146 http://dx.doi.org/10.1186/1471-2105-7-S2-S3 Text en Copyright © 2006 Wren; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wren, Jonathan D A scalable machine-learning approach to recognize chemical names within large text databases
title	A scalable machine-learning approach to recognize chemical names within large text databases
title_full	A scalable machine-learning approach to recognize chemical names within large text databases
title_fullStr	A scalable machine-learning approach to recognize chemical names within large text databases
title_full_unstemmed	A scalable machine-learning approach to recognize chemical names within large text databases
title_short	A scalable machine-learning approach to recognize chemical names within large text databases
title_sort	scalable machine-learning approach to recognize chemical names within large text databases
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683569/ https://www.ncbi.nlm.nih.gov/pubmed/17118146 http://dx.doi.org/10.1186/1471-2105-7-S2-S3
work_keys_str_mv	AT wrenjonathand ascalablemachinelearningapproachtorecognizechemicalnameswithinlargetextdatabases AT wrenjonathand scalablemachinelearningapproachtorecognizechemicalnameswithinlargetextdatabases

A scalable machine-learning approach to recognize chemical names within large text databases

Ejemplares similares