Cargando…

Learning Semantic Tags from Big Data for Clinical Text Representation

In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yanpeng, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525271/
https://www.ncbi.nlm.nih.gov/pubmed/26306286
_version_ 1782384305870209024
author Li, Yanpeng
Liu, Hongfang
author_facet Li, Yanpeng
Liu, Hongfang
author_sort Li, Yanpeng
collection PubMed
description In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level.
format Online
Article
Text
id pubmed-4525271
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-45252712015-08-24 Learning Semantic Tags from Big Data for Clinical Text Representation Li, Yanpeng Liu, Hongfang AMIA Jt Summits Transl Sci Proc Articles In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level. American Medical Informatics Association 2015-03-25 /pmc/articles/PMC4525271/ /pubmed/26306286 Text en ©2015 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Li, Yanpeng
Liu, Hongfang
Learning Semantic Tags from Big Data for Clinical Text Representation
title Learning Semantic Tags from Big Data for Clinical Text Representation
title_full Learning Semantic Tags from Big Data for Clinical Text Representation
title_fullStr Learning Semantic Tags from Big Data for Clinical Text Representation
title_full_unstemmed Learning Semantic Tags from Big Data for Clinical Text Representation
title_short Learning Semantic Tags from Big Data for Clinical Text Representation
title_sort learning semantic tags from big data for clinical text representation
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525271/
https://www.ncbi.nlm.nih.gov/pubmed/26306286
work_keys_str_mv AT liyanpeng learningsemantictagsfrombigdataforclinicaltextrepresentation
AT liuhongfang learningsemantictagsfrombigdataforclinicaltextrepresentation