Cargando…

Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the mod...

Descripción completa

Detalles Bibliográficos
Autores principales:	Torii, Manabu, Wagholikar, Kavishwar, Liu, Hongfang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908466/ https://www.ncbi.nlm.nih.gov/pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3

_version_	1782301708516327424
author	Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang
author_facet	Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang
author_sort	Torii, Manabu
collection	PubMed
description	BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. RESULTS: Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. CONCLUSIONS: The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered.
format	Online Article Text
id	pubmed-3908466
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39084662014-02-01 Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang J Biomed Semantics Research BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. RESULTS: Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. CONCLUSIONS: The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered. BioMed Central 2014-01-17 /pmc/articles/PMC3908466/ /pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3 Text en Copyright © 2014 Torii et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title_full	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title_fullStr	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title_full_unstemmed	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title_short	Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
title_sort	detecting concept mentions in biomedical text using hidden markov model: multiple concept types at once or one at a time?
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908466/ https://www.ncbi.nlm.nih.gov/pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3
work_keys_str_mv	AT toriimanabu detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime AT wagholikarkavishwar detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime AT liuhongfang detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime

Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

Ejemplares similares