Cargando…
Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?
BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the mod...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908466/ https://www.ncbi.nlm.nih.gov/pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3 |
_version_ | 1782301708516327424 |
---|---|
author | Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang |
author_facet | Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang |
author_sort | Torii, Manabu |
collection | PubMed |
description | BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. RESULTS: Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. CONCLUSIONS: The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered. |
format | Online Article Text |
id | pubmed-3908466 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39084662014-02-01 Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang J Biomed Semantics Research BACKGROUND: Identifying phrases that refer to particular concept types is a critical step in extracting information from documents. Provided with annotated documents as training data, supervised machine learning can automate this process. When building a machine learning model for this task, the model may be built to detect all types simultaneously (all-types-at-once) or it may be built for one or a few selected types at a time (one-type- or a-few-types-at-a-time). It is of interest to investigate which strategy yields better detection performance. RESULTS: Hidden Markov models using the different strategies were evaluated on a clinical corpus annotated with three concept types (i2b2/VA corpus) and a biology literature corpus annotated with five concept types (JNLPBA corpus). Ten-fold cross-validation tests were conducted and the experimental results showed that models trained for multiple concept types consistently yielded better performance than those trained for a single concept type. F-scores observed for the former strategies were higher than those observed for the latter by 0.9 to 2.6% on the i2b2/VA corpus and 1.4 to 10.1% on the JNLPBA corpus, depending on the target concept types. Improved boundary detection and reduced type confusion were observed for the all-types-at-once strategy. CONCLUSIONS: The current results suggest that detection of concept phrases could be improved by simultaneously tackling multiple concept types. This also suggests that we should annotate multiple concept types in developing a new corpus for machine learning models. Further investigation is expected to gain insights in the underlying mechanism to achieve good performance when multiple concept types are considered. BioMed Central 2014-01-17 /pmc/articles/PMC3908466/ /pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3 Text en Copyright © 2014 Torii et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Torii, Manabu Wagholikar, Kavishwar Liu, Hongfang Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title | Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title_full | Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title_fullStr | Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title_full_unstemmed | Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title_short | Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time? |
title_sort | detecting concept mentions in biomedical text using hidden markov model: multiple concept types at once or one at a time? |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908466/ https://www.ncbi.nlm.nih.gov/pubmed/24438362 http://dx.doi.org/10.1186/2041-1480-5-3 |
work_keys_str_mv | AT toriimanabu detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime AT wagholikarkavishwar detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime AT liuhongfang detectingconceptmentionsinbiomedicaltextusinghiddenmarkovmodelmultipleconcepttypesatonceoroneatatime |