Cargando…

MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING

INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)(1). Leveraging real-world-evidence’ (RWE) from routine health data...

Descripción completa

Detalles Bibliográficos
Autores principales: Wells, Michael, Robin, Adam, Poisson, Laila, Noushmehr, Houtan, Snyder, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7213474/
http://dx.doi.org/10.1093/noajnl/vdz014.064
Descripción
Sumario:INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures)(1). Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity(2). Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years(3,4). The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe(5) to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML.