Cargando…

Concept Discovery for Pathology Reports using an N-gram Model

A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. One of the leading systems in the cancer community is the Cancer Text Information Extraction System (caTIES), which was developed...

Descripción completa

Detalles Bibliográficos
Autores principales: Yip, Vincent, Mete, Mutlu, Topaloglu, Umit, Kockara, Sinan
Formato: Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041542/
https://www.ncbi.nlm.nih.gov/pubmed/21347147
_version_ 1782198443025891328
author Yip, Vincent
Mete, Mutlu
Topaloglu, Umit
Kockara, Sinan
author_facet Yip, Vincent
Mete, Mutlu
Topaloglu, Umit
Kockara, Sinan
author_sort Yip, Vincent
collection PubMed
description A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. One of the leading systems in the cancer community is the Cancer Text Information Extraction System (caTIES), which was developed with caBIG-compliant data structures. caTIES embedded two key components for extracting data: MMTx and GATE. In this paper, an n-gram based framework is proven to be capable of discovering concepts from text reports. MetaMap is used to map medical terms to the National Cancer Institute (NCI) Metathesaurus and the Unified Medical Language System (UMLS) Metathesaurus for verifying legitimate medical data. The final concepts from our framework and caTIES are weighted based on our scoring model. The scores show that, on average, our framework scores higher than caTIES on 848 (36.9%) of reports. Furthermore, 1388 (60.5%) of reports have similar performances on both systems.
format Text
id pubmed-3041542
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-30415422011-02-23 Concept Discovery for Pathology Reports using an N-gram Model Yip, Vincent Mete, Mutlu Topaloglu, Umit Kockara, Sinan Summit on Translat Bioinforma Articles A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. One of the leading systems in the cancer community is the Cancer Text Information Extraction System (caTIES), which was developed with caBIG-compliant data structures. caTIES embedded two key components for extracting data: MMTx and GATE. In this paper, an n-gram based framework is proven to be capable of discovering concepts from text reports. MetaMap is used to map medical terms to the National Cancer Institute (NCI) Metathesaurus and the Unified Medical Language System (UMLS) Metathesaurus for verifying legitimate medical data. The final concepts from our framework and caTIES are weighted based on our scoring model. The scores show that, on average, our framework scores higher than caTIES on 848 (36.9%) of reports. Furthermore, 1388 (60.5%) of reports have similar performances on both systems. American Medical Informatics Association 2010-03-01 /pmc/articles/PMC3041542/ /pubmed/21347147 Text en ©2010 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Yip, Vincent
Mete, Mutlu
Topaloglu, Umit
Kockara, Sinan
Concept Discovery for Pathology Reports using an N-gram Model
title Concept Discovery for Pathology Reports using an N-gram Model
title_full Concept Discovery for Pathology Reports using an N-gram Model
title_fullStr Concept Discovery for Pathology Reports using an N-gram Model
title_full_unstemmed Concept Discovery for Pathology Reports using an N-gram Model
title_short Concept Discovery for Pathology Reports using an N-gram Model
title_sort concept discovery for pathology reports using an n-gram model
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041542/
https://www.ncbi.nlm.nih.gov/pubmed/21347147
work_keys_str_mv AT yipvincent conceptdiscoveryforpathologyreportsusinganngrammodel
AT metemutlu conceptdiscoveryforpathologyreportsusinganngrammodel
AT topalogluumit conceptdiscoveryforpathologyreportsusinganngrammodel
AT kockarasinan conceptdiscoveryforpathologyreportsusinganngrammodel