Cargando…
Learning Semantic Tags from Big Data for Clinical Text Representation
In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525271/ https://www.ncbi.nlm.nih.gov/pubmed/26306286 |
_version_ | 1782384305870209024 |
---|---|
author | Li, Yanpeng Liu, Hongfang |
author_facet | Li, Yanpeng Liu, Hongfang |
author_sort | Li, Yanpeng |
collection | PubMed |
description | In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level. |
format | Online Article Text |
id | pubmed-4525271 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-45252712015-08-24 Learning Semantic Tags from Big Data for Clinical Text Representation Li, Yanpeng Liu, Hongfang AMIA Jt Summits Transl Sci Proc Articles In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level. American Medical Informatics Association 2015-03-25 /pmc/articles/PMC4525271/ /pubmed/26306286 Text en ©2015 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Li, Yanpeng Liu, Hongfang Learning Semantic Tags from Big Data for Clinical Text Representation |
title | Learning Semantic Tags from Big Data for Clinical Text Representation |
title_full | Learning Semantic Tags from Big Data for Clinical Text Representation |
title_fullStr | Learning Semantic Tags from Big Data for Clinical Text Representation |
title_full_unstemmed | Learning Semantic Tags from Big Data for Clinical Text Representation |
title_short | Learning Semantic Tags from Big Data for Clinical Text Representation |
title_sort | learning semantic tags from big data for clinical text representation |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525271/ https://www.ncbi.nlm.nih.gov/pubmed/26306286 |
work_keys_str_mv | AT liyanpeng learningsemantictagsfrombigdataforclinicaltextrepresentation AT liuhongfang learningsemantictagsfrombigdataforclinicaltextrepresentation |