Cargando…

Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization

BACKGROUND: Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. METHODS: We de...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Fagen, Lee, Janet, Munoz-Plaza, Corrine E., Hahn, Erin E., Chen, Wansu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5760847/
https://www.ncbi.nlm.nih.gov/pubmed/29416911
http://dx.doi.org/10.4103/jpi.jpi_55_17
_version_ 1783291448089640960
author Xie, Fagen
Lee, Janet
Munoz-Plaza, Corrine E.
Hahn, Erin E.
Chen, Wansu
author_facet Xie, Fagen
Lee, Janet
Munoz-Plaza, Corrine E.
Hahn, Erin E.
Chen, Wansu
author_sort Xie, Fagen
collection PubMed
description BACKGROUND: Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. METHODS: We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG) were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM) concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS) application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. RESULTS: A total of 437 breast cancer concept terms and 14 combinations of “breast“and “cancer“ terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. CONCLUSIONS: The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis in support of clinical research studies.
format Online
Article
Text
id pubmed-5760847
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-57608472018-02-07 Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization Xie, Fagen Lee, Janet Munoz-Plaza, Corrine E. Hahn, Erin E. Chen, Wansu J Pathol Inform Original Article BACKGROUND: Surgical pathology reports (SPR) contain rich clinical diagnosis information. The text information extraction system (TIES) is an end-to-end application leveraging natural language processing technologies and focused on the processing of pathology and/or radiology reports. METHODS: We deployed the TIES system and integrated SPRs into the TIES system on a daily basis at Kaiser Permanente Southern California. The breast cancer cases diagnosed in December 2013 from the Cancer Registry (CANREG) were used to validate the performance of the TIES system. The National Cancer Institute Metathesaurus (NCIM) concept terms and codes to describe breast cancer were identified through the Unified Medical Language System Terminology Service (UTS) application. The identified NCIM codes were used to search for the coded SPRs in the back-end datastore directly. The identified cases were then compared with the breast cancer patients pulled from CANREG. RESULTS: A total of 437 breast cancer concept terms and 14 combinations of “breast“and “cancer“ terms were identified from the UTS application. A total of 249 breast cancer cases diagnosed in December 2013 was pulled from CANREG. Out of these 249 cases, 241 were successfully identified by the TIES system from a total of 457 reports. The TIES system also identified an additional 277 cases that were not part of the validation sample. Out of the 277 cases, 11% were determined as highly likely to be cases after manual examinations, and 86% were in CANREG but were diagnosed in months other than December of 2013. CONCLUSIONS: The study demonstrated that the TIES system can effectively identify potential breast cancer cases in our care setting. Identified potential cases can be easily confirmed by reviewing the corresponding annotated reports through the front-end visualization interface. The TIES system is a great tool for identifying potential various cancer cases in a timely manner and on a regular basis in support of clinical research studies. Medknow Publications & Media Pvt Ltd 2017-12-14 /pmc/articles/PMC5760847/ /pubmed/29416911 http://dx.doi.org/10.4103/jpi.jpi_55_17 Text en Copyright: © 2017 Journal of Pathology Informatics http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.
spellingShingle Original Article
Xie, Fagen
Lee, Janet
Munoz-Plaza, Corrine E.
Hahn, Erin E.
Chen, Wansu
Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title_full Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title_fullStr Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title_full_unstemmed Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title_short Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization
title_sort application of text information extraction system for real-time cancer case identification in an integrated healthcare organization
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5760847/
https://www.ncbi.nlm.nih.gov/pubmed/29416911
http://dx.doi.org/10.4103/jpi.jpi_55_17
work_keys_str_mv AT xiefagen applicationoftextinformationextractionsystemforrealtimecancercaseidentificationinanintegratedhealthcareorganization
AT leejanet applicationoftextinformationextractionsystemforrealtimecancercaseidentificationinanintegratedhealthcareorganization
AT munozplazacorrinee applicationoftextinformationextractionsystemforrealtimecancercaseidentificationinanintegratedhealthcareorganization
AT hahnerine applicationoftextinformationextractionsystemforrealtimecancercaseidentificationinanintegratedhealthcareorganization
AT chenwansu applicationoftextinformationextractionsystemforrealtimecancercaseidentificationinanintegratedhealthcareorganization