Cargando…

Using natural language processing and machine learning to identify breast cancer local recurrence

BACKGROUND: Identifying local recurrences in breast cancer from patient data sets is important for clinical research and practice. Developing a model using natural language processing and machine learning to identify local recurrences in breast cancer patients can reduce the time-consuming work of a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zeng, Zexian, Espino, Sasa, Roy, Ankita, Li, Xiaoyu, Khan, Seema A., Clare, Susan E., Jiang, Xia, Neapolitan, Richard, Luo, Yuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309052/ https://www.ncbi.nlm.nih.gov/pubmed/30591037 http://dx.doi.org/10.1186/s12859-018-2466-x

_version_	1783383329022672896
author	Zeng, Zexian Espino, Sasa Roy, Ankita Li, Xiaoyu Khan, Seema A. Clare, Susan E. Jiang, Xia Neapolitan, Richard Luo, Yuan
author_facet	Zeng, Zexian Espino, Sasa Roy, Ankita Li, Xiaoyu Khan, Seema A. Clare, Susan E. Jiang, Xia Neapolitan, Richard Luo, Yuan
author_sort	Zeng, Zexian
collection	PubMed
description	BACKGROUND: Identifying local recurrences in breast cancer from patient data sets is important for clinical research and practice. Developing a model using natural language processing and machine learning to identify local recurrences in breast cancer patients can reduce the time-consuming work of a manual chart review. METHODS: We design a novel concept-based filter and a prediction model to detect local recurrences using EHRs. In the training dataset, we manually review a development corpus of 50 progress notes and extract partial sentences that indicate breast cancer local recurrence. We process these partial sentences to obtain a set of Unified Medical Language System (UMLS) concepts using MetaMap, and we call it positive concept set. We apply MetaMap on patients’ progress notes and retain only the concepts that fall within the positive concept set. These features combined with the number of pathology reports recorded for each patient are used to train a support vector machine to identify local recurrences. RESULTS: We compared our model with three baseline classifiers using either full MetaMap concepts, filtered MetaMap concepts, or bag of words. Our model achieved the best AUC (0.93 in cross-validation, 0.87 in held-out testing). CONCLUSIONS: Compared to a labor-intensive chart review, our model provides an automated way to identify breast cancer local recurrences. We expect that by minimally adapting the positive concept set, this study has the potential to be replicated at other institutions with a moderately sized training dataset. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2466-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6309052
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63090522019-01-03 Using natural language processing and machine learning to identify breast cancer local recurrence Zeng, Zexian Espino, Sasa Roy, Ankita Li, Xiaoyu Khan, Seema A. Clare, Susan E. Jiang, Xia Neapolitan, Richard Luo, Yuan BMC Bioinformatics Research BACKGROUND: Identifying local recurrences in breast cancer from patient data sets is important for clinical research and practice. Developing a model using natural language processing and machine learning to identify local recurrences in breast cancer patients can reduce the time-consuming work of a manual chart review. METHODS: We design a novel concept-based filter and a prediction model to detect local recurrences using EHRs. In the training dataset, we manually review a development corpus of 50 progress notes and extract partial sentences that indicate breast cancer local recurrence. We process these partial sentences to obtain a set of Unified Medical Language System (UMLS) concepts using MetaMap, and we call it positive concept set. We apply MetaMap on patients’ progress notes and retain only the concepts that fall within the positive concept set. These features combined with the number of pathology reports recorded for each patient are used to train a support vector machine to identify local recurrences. RESULTS: We compared our model with three baseline classifiers using either full MetaMap concepts, filtered MetaMap concepts, or bag of words. Our model achieved the best AUC (0.93 in cross-validation, 0.87 in held-out testing). CONCLUSIONS: Compared to a labor-intensive chart review, our model provides an automated way to identify breast cancer local recurrences. We expect that by minimally adapting the positive concept set, this study has the potential to be replicated at other institutions with a moderately sized training dataset. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2466-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-28 /pmc/articles/PMC6309052/ /pubmed/30591037 http://dx.doi.org/10.1186/s12859-018-2466-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Zeng, Zexian Espino, Sasa Roy, Ankita Li, Xiaoyu Khan, Seema A. Clare, Susan E. Jiang, Xia Neapolitan, Richard Luo, Yuan Using natural language processing and machine learning to identify breast cancer local recurrence
title	Using natural language processing and machine learning to identify breast cancer local recurrence
title_full	Using natural language processing and machine learning to identify breast cancer local recurrence
title_fullStr	Using natural language processing and machine learning to identify breast cancer local recurrence
title_full_unstemmed	Using natural language processing and machine learning to identify breast cancer local recurrence
title_short	Using natural language processing and machine learning to identify breast cancer local recurrence
title_sort	using natural language processing and machine learning to identify breast cancer local recurrence
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6309052/ https://www.ncbi.nlm.nih.gov/pubmed/30591037 http://dx.doi.org/10.1186/s12859-018-2466-x
work_keys_str_mv	AT zengzexian usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT espinosasa usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT royankita usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT lixiaoyu usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT khanseemaa usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT claresusane usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT jiangxia usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT neapolitanrichard usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence AT luoyuan usingnaturallanguageprocessingandmachinelearningtoidentifybreastcancerlocalrecurrence

Using natural language processing and machine learning to identify breast cancer local recurrence

Ejemplares similares