Cargando…
MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING
INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)(1). Leveraging real-world-evidence’ (RWE) from routine health data...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7213474/ http://dx.doi.org/10.1093/noajnl/vdz014.064 |
_version_ | 1783531812286365696 |
---|---|
author | Wells, Michael Robin, Adam Poisson, Laila Noushmehr, Houtan Snyder, James |
author_facet | Wells, Michael Robin, Adam Poisson, Laila Noushmehr, Houtan Snyder, James |
author_sort | Wells, Michael |
collection | PubMed |
description | INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)(1). Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity(2). Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years(3,4). The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe(5) to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML. |
format | Online Article Text |
id | pubmed-7213474 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72134742020-07-07 MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING Wells, Michael Robin, Adam Poisson, Laila Noushmehr, Houtan Snyder, James Neurooncol Adv Abstracts INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)(1). Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity(2). Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years(3,4). The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe(5) to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML. Oxford University Press 2019-08-12 /pmc/articles/PMC7213474/ http://dx.doi.org/10.1093/noajnl/vdz014.064 Text en © The Author(s) 2019. Published by Oxford University Press, the Society for Neuro-Oncology and the European Association of Neuro-Oncology. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Abstracts Wells, Michael Robin, Adam Poisson, Laila Noushmehr, Houtan Snyder, James MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title | MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title_full | MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title_fullStr | MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title_full_unstemmed | MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title_short | MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING |
title_sort | mlti-05. identifying brain metastatic cases from free text clinical narratives with refinement of semantic heterogeneity using machine learning |
topic | Abstracts |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7213474/ http://dx.doi.org/10.1093/noajnl/vdz014.064 |
work_keys_str_mv | AT wellsmichael mlti05identifyingbrainmetastaticcasesfromfreetextclinicalnarrativeswithrefinementofsemanticheterogeneityusingmachinelearning AT robinadam mlti05identifyingbrainmetastaticcasesfromfreetextclinicalnarrativeswithrefinementofsemanticheterogeneityusingmachinelearning AT poissonlaila mlti05identifyingbrainmetastaticcasesfromfreetextclinicalnarrativeswithrefinementofsemanticheterogeneityusingmachinelearning AT noushmehrhoutan mlti05identifyingbrainmetastaticcasesfromfreetextclinicalnarrativeswithrefinementofsemanticheterogeneityusingmachinelearning AT snyderjames mlti05identifyingbrainmetastaticcasesfromfreetextclinicalnarrativeswithrefinementofsemanticheterogeneityusingmachinelearning |