Cargando…

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California

BACKGROUND: Population-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We as...

Descripción completa

Detalles Bibliográficos
Autores principales: Maguire, Frances B., Morris, Cyllene R., Parikh-Patel, Arti, Cress, Rosemary D., Keegan, Theresa H. M., Li, Chin-Shang, Lin, Patrick S., Kizer, Kenneth W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386345/
https://www.ncbi.nlm.nih.gov/pubmed/30794610
http://dx.doi.org/10.1371/journal.pone.0212454
_version_ 1783397365955166208
author Maguire, Frances B.
Morris, Cyllene R.
Parikh-Patel, Arti
Cress, Rosemary D.
Keegan, Theresa H. M.
Li, Chin-Shang
Lin, Patrick S.
Kizer, Kenneth W.
author_facet Maguire, Frances B.
Morris, Cyllene R.
Parikh-Patel, Arti
Cress, Rosemary D.
Keegan, Theresa H. M.
Li, Chin-Shang
Lin, Patrick S.
Kizer, Kenneth W.
author_sort Maguire, Frances B.
collection PubMed
description BACKGROUND: Population-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We assessed the accuracy and efficiency of a text-mining algorithm to identify systemic treatments for lung cancer from free-text fields in the California Cancer Registry. METHODS: The algorithm used Perl regular expressions in SAS 9.4 to search for treatments in 24,845 free-text records associated with 17,310 patients in California diagnosed with stage IV non-small cell lung cancer between 2012 and 2014. Our algorithm categorized treatments into six groups that align with National Comprehensive Cancer Network guidelines. We compared results to a manual review (gold standard) of the same records. RESULTS: Percent agreement ranged from 91.1% to 99.4%. Ranges for other measures were 0.71–0.92 (Kappa), 74.3%-97.3% (sensitivity), 92.4%-99.8% (specificity), 60.4%-96.4% (positive predictive value), and 92.9%-99.9% (negative predictive value). The text-mining algorithm used one-sixth of the time required for manual review. CONCLUSION: SAS-based text mining of free-text data can accurately detect systemic treatments administered to patients and save considerable time compared to manual review, maximizing the utility of the extant information in population-based cancer registries for comparative effectiveness research.
format Online
Article
Text
id pubmed-6386345
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63863452019-03-09 A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California Maguire, Frances B. Morris, Cyllene R. Parikh-Patel, Arti Cress, Rosemary D. Keegan, Theresa H. M. Li, Chin-Shang Lin, Patrick S. Kizer, Kenneth W. PLoS One Research Article BACKGROUND: Population-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We assessed the accuracy and efficiency of a text-mining algorithm to identify systemic treatments for lung cancer from free-text fields in the California Cancer Registry. METHODS: The algorithm used Perl regular expressions in SAS 9.4 to search for treatments in 24,845 free-text records associated with 17,310 patients in California diagnosed with stage IV non-small cell lung cancer between 2012 and 2014. Our algorithm categorized treatments into six groups that align with National Comprehensive Cancer Network guidelines. We compared results to a manual review (gold standard) of the same records. RESULTS: Percent agreement ranged from 91.1% to 99.4%. Ranges for other measures were 0.71–0.92 (Kappa), 74.3%-97.3% (sensitivity), 92.4%-99.8% (specificity), 60.4%-96.4% (positive predictive value), and 92.9%-99.9% (negative predictive value). The text-mining algorithm used one-sixth of the time required for manual review. CONCLUSION: SAS-based text mining of free-text data can accurately detect systemic treatments administered to patients and save considerable time compared to manual review, maximizing the utility of the extant information in population-based cancer registries for comparative effectiveness research. Public Library of Science 2019-02-22 /pmc/articles/PMC6386345/ /pubmed/30794610 http://dx.doi.org/10.1371/journal.pone.0212454 Text en © 2019 Maguire et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Maguire, Frances B.
Morris, Cyllene R.
Parikh-Patel, Arti
Cress, Rosemary D.
Keegan, Theresa H. M.
Li, Chin-Shang
Lin, Patrick S.
Kizer, Kenneth W.
A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title_full A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title_fullStr A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title_full_unstemmed A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title_short A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
title_sort text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: a study of non-small cell lung cancer in california
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386345/
https://www.ncbi.nlm.nih.gov/pubmed/30794610
http://dx.doi.org/10.1371/journal.pone.0212454
work_keys_str_mv AT maguirefrancesb atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT morriscyllener atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT parikhpatelarti atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT cressrosemaryd atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT keegantheresahm atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT lichinshang atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT linpatricks atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT kizerkennethw atextminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT maguirefrancesb textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT morriscyllener textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT parikhpatelarti textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT cressrosemaryd textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT keegantheresahm textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT lichinshang textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT linpatricks textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia
AT kizerkennethw textminingapproachtoobtaindetailedtreatmentinformationfromfreetextfieldsinpopulationbasedcancerregistriesastudyofnonsmallcelllungcancerincalifornia