Cargando…

Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports

Although registry specific requirements exist, cancer registries primarily identify reportable cases using a combination of particular ICD-O-3 topography and morphology codes assigned to cancer case abstracts of which free text pathology reports form a main component. The codes are generally extract...

Descripción completa

Detalles Bibliográficos
Autores principales: Kavuluru, Ramakanth, Hands, Isaac, Durbin, Eric B., Witt, Lisa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 201
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845766/
https://www.ncbi.nlm.nih.gov/pubmed/24303247
_version_ 1782293363279527936
author Kavuluru, Ramakanth
Hands, Isaac
Durbin, Eric B.
Witt, Lisa
author_facet Kavuluru, Ramakanth
Hands, Isaac
Durbin, Eric B.
Witt, Lisa
author_sort Kavuluru, Ramakanth
collection PubMed
description Although registry specific requirements exist, cancer registries primarily identify reportable cases using a combination of particular ICD-O-3 topography and morphology codes assigned to cancer case abstracts of which free text pathology reports form a main component. The codes are generally extracted from pathology reports by trained human coders, sometimes with the help of software programs. Here we present results that improve on the state-of-the-art in automatic extraction of 57 generic sites from pathology reports using three representative machine learning algorithms in text classification. We use a dataset of 56,426 reports arising from 35 labs that report to the Kentucky Cancer Registry. Employing unigrams, bigrams, and named entities as features, our methods achieve a class-based micro F-score of 0.9 and macro F-score of 0.72. To our knowledge, this is the best result on extracting ICD-O-3 codes from pathology reports using a large number of possible codes. Given the large dataset we use (compared to other similar efforts) with reports from 35 different labs, we also expect our final models to generalize better when extracting primary sites from previously unseen reports.
format Online
Article
Text
id pubmed-3845766
institution National Center for Biotechnology Information
language English
publishDate 201
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-38457662013-12-03 Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports Kavuluru, Ramakanth Hands, Isaac Durbin, Eric B. Witt, Lisa AMIA Jt Summits Transl Sci Proc Articles Although registry specific requirements exist, cancer registries primarily identify reportable cases using a combination of particular ICD-O-3 topography and morphology codes assigned to cancer case abstracts of which free text pathology reports form a main component. The codes are generally extracted from pathology reports by trained human coders, sometimes with the help of software programs. Here we present results that improve on the state-of-the-art in automatic extraction of 57 generic sites from pathology reports using three representative machine learning algorithms in text classification. We use a dataset of 56,426 reports arising from 35 labs that report to the Kentucky Cancer Registry. Employing unigrams, bigrams, and named entities as features, our methods achieve a class-based micro F-score of 0.9 and macro F-score of 0.72. To our knowledge, this is the best result on extracting ICD-O-3 codes from pathology reports using a large number of possible codes. Given the large dataset we use (compared to other similar efforts) with reports from 35 different labs, we also expect our final models to generalize better when extracting primary sites from previously unseen reports. American Medical Informatics Association 2013 -03- 18 /pmc/articles/PMC3845766/ /pubmed/24303247 Text en ©2013 AMIA - All rights reserved.
spellingShingle Articles
Kavuluru, Ramakanth
Hands, Isaac
Durbin, Eric B.
Witt, Lisa
Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title_full Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title_fullStr Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title_full_unstemmed Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title_short Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports
title_sort automatic extraction of icd-o-3 primary sites from cancer pathology reports
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845766/
https://www.ncbi.nlm.nih.gov/pubmed/24303247
work_keys_str_mv AT kavulururamakanth automaticextractionoficdo3primarysitesfromcancerpathologyreports
AT handsisaac automaticextractionoficdo3primarysitesfromcancerpathologyreports
AT durbinericb automaticextractionoficdo3primarysitesfromcancerpathologyreports
AT wittlisa automaticextractionoficdo3primarysitesfromcancerpathologyreports