Cargando…
Development of a Natural Language Processing Tool to Extract Radiation Treatment Sites
Currently, radiation oncology-specific electronic medical records (EMRs) allow providers to input the radiation treatment site using free text. The purpose of this study is to develop a natural language processing (NLP) tool to extract encoded data from radiation treatment sites in an EMR. Treatment...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cureus
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6881088/ https://www.ncbi.nlm.nih.gov/pubmed/31815074 http://dx.doi.org/10.7759/cureus.6010 |
Sumario: | Currently, radiation oncology-specific electronic medical records (EMRs) allow providers to input the radiation treatment site using free text. The purpose of this study is to develop a natural language processing (NLP) tool to extract encoded data from radiation treatment sites in an EMR. Treatment sites were extracted from all patients who completed treatment in our department from April 1, 2011, to April 30, 2013. A system was designed to extract the Unified Medical Language System (UMLS) concept codes using a sample of 11,018 unique site names from 31118 radiation therapy (RT) sites. Among those, 5500 unique site name strings that constitute approximately half of the sample were spared as a test set to evaluate the final system. A dictionary and calculated n-gram statistics using UMLS concepts from related semantic types were combined with manually encoded data. There was an average of 2.2 sites per patient. Prior to extraction, the 20 most common unique treatment sites were used 4215 times (38.3%). The most common treatment site was whole brain RT, which was entered using 27 distinct terms for a total of 1063 times. The customized NLP solution displayed great gains as compared to other systems, with a recall of 0.99 and a precision of 0.99. A customized NLP tool was extracting encoded data from radiation treatment sites in an EMR with great accuracy. This can be integrated into a repository of demographic, genomic, treatment, and outcome data to advance personalized oncologic care. |
---|