Cargando…

Identifying Metastases-related Information from Pathology Reports of Lung Cancer Patients

Metastatic patterns of spread at the time of cancer recurrence are one of the most important prognostic factors in estimation of clinical course and survival of the patient. This information is not easily accessible since it’s rarely recorded in a structured format. This paper describes a system for...

Descripción completa

Detalles Bibliográficos
Autores principales: Soysal, Ergin, Warner, Jeremy L, Denny, Joshua C, Xu, Hua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543353/
https://www.ncbi.nlm.nih.gov/pubmed/28815141
Descripción
Sumario:Metastatic patterns of spread at the time of cancer recurrence are one of the most important prognostic factors in estimation of clinical course and survival of the patient. This information is not easily accessible since it’s rarely recorded in a structured format. This paper describes a system for categorization of pathology reports by specimen site and the detection of metastatic status within the report. A clinical NLP pipeline was developed using sentence boundary detection, tokenization, section identification, part-of-speech tagger, and chunker with some rule based methods to extract metastasis site and status in combination with five types of information related to tumor metastases: histological type, grade, specimen site, metastatic status indicators and the procedure. The system achieved a recall of 0.84 and 0.88 precision for metastatic status detection, and 0.89 recall and 0.93 precision for metastasis site detection. This study demonstrates the feasibility of applying NLP technologies to extract valuable metastases information from pathology reports and we believe that it will greatly benefit studies on cancer metastases that utilize EHRs.