Cargando…

Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry

Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system...

Descripción completa

Detalles Bibliográficos
Autores principales: AAlAbdulsalam, Abdulrahman K., Garvin, Jennifer H., Redd, Andrew, Carter, Marjorie E., Sweeny, Carol, Meystre, Stephane M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961766/
https://www.ncbi.nlm.nih.gov/pubmed/29888032
_version_ 1783324775524859904
author AAlAbdulsalam, Abdulrahman K.
Garvin, Jennifer H.
Redd, Andrew
Carter, Marjorie E.
Sweeny, Carol
Meystre, Stephane M.
author_facet AAlAbdulsalam, Abdulrahman K.
Garvin, Jennifer H.
Redd, Andrew
Carter, Marjorie E.
Sweeny, Carol
Meystre, Stephane M.
author_sort AAlAbdulsalam, Abdulrahman K.
collection PubMed
description Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%).
format Online
Article
Text
id pubmed-5961766
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-59617662018-06-08 Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry AAlAbdulsalam, Abdulrahman K. Garvin, Jennifer H. Redd, Andrew Carter, Marjorie E. Sweeny, Carol Meystre, Stephane M. AMIA Jt Summits Transl Sci Proc Articles Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%–98.4% and classification sensitivity: 83.5%–87%). American Medical Informatics Association 2018-05-18 /pmc/articles/PMC5961766/ /pubmed/29888032 Text en ©2018 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
AAlAbdulsalam, Abdulrahman K.
Garvin, Jennifer H.
Redd, Andrew
Carter, Marjorie E.
Sweeny, Carol
Meystre, Stephane M.
Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title_full Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title_fullStr Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title_full_unstemmed Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title_short Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry
title_sort automated extraction and classification of cancer stage mentions fromunstructured text fields in a central cancer registry
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5961766/
https://www.ncbi.nlm.nih.gov/pubmed/29888032
work_keys_str_mv AT aalabdulsalamabdulrahmank automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry
AT garvinjenniferh automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry
AT reddandrew automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry
AT cartermarjoriee automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry
AT sweenycarol automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry
AT meystrestephanem automatedextractionandclassificationofcancerstagementionsfromunstructuredtextfieldsinacentralcancerregistry