Cargando…

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

BACKGROUND: Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than strok...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Sunyang, Leung, Lester Y, Wang, Yanshan, Raulli, Anne-Olivia, Kallmes, David F, Kinsman, Kristin A, Nelson, Kristoff B, Clark, Michael S, Luetmer, Patrick H, Kingsbury, Paul R, Kent, David M, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6524454/
https://www.ncbi.nlm.nih.gov/pubmed/31066686
http://dx.doi.org/10.2196/12109
_version_ 1783419561573351424
author Fu, Sunyang
Leung, Lester Y
Wang, Yanshan
Raulli, Anne-Olivia
Kallmes, David F
Kinsman, Kristin A
Nelson, Kristoff B
Clark, Michael S
Luetmer, Patrick H
Kingsbury, Paul R
Kent, David M
Liu, Hongfang
author_facet Fu, Sunyang
Leung, Lester Y
Wang, Yanshan
Raulli, Anne-Olivia
Kallmes, David F
Kinsman, Kristin A
Nelson, Kristoff B
Clark, Michael S
Luetmer, Patrick H
Kingsbury, Paul R
Kent, David M
Liu, Hongfang
author_sort Fu, Sunyang
collection PubMed
description BACKGROUND: Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. OBJECTIVE: This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. METHODS: Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing. RESULTS: Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. CONCLUSIONS: We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.
format Online
Article
Text
id pubmed-6524454
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-65244542019-06-07 Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports Fu, Sunyang Leung, Lester Y Wang, Yanshan Raulli, Anne-Olivia Kallmes, David F Kinsman, Kristin A Nelson, Kristoff B Clark, Michael S Luetmer, Patrick H Kingsbury, Paul R Kent, David M Liu, Hongfang JMIR Med Inform Original Paper BACKGROUND: Silent brain infarction (SBI) is defined as the presence of 1 or more brain lesions, presumed to be because of vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases from electronic health records (EHRs) by extracting, normalizing, and classifying SBI-related incidental findings interpreted by radiologists from neuroimaging reports. OBJECTIVE: This study aimed to develop NLP systems to determine individuals with incidentally discovered SBIs from neuroimaging reports at 2 sites: Mayo Clinic and Tufts Medical Center. METHODS: Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based systems, including significant words and patterns related to SBI, were generated using pointwise mutual information. The machine learning models adopted convolutional neural network (CNN), random forest, support vector machine, and logistic regression. The performance of the NLP algorithm was compared with a manually created gold standard. The gold standard dataset includes 1000 radiology reports randomly retrieved from the 2 study sites (Mayo and Tufts) corresponding to patients with no prior or current diagnosis of stroke or dementia. 400 out of the 1000 reports were randomly sampled and double read to determine interannotator agreements. The gold standard dataset was equally split to 3 subsets for training, developing, and testing. RESULTS: Among the 400 reports selected to determine interannotator agreement, 5 reports were removed due to invalid scan types. The interannotator agreements across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting white matter disease (WMD) with an accuracy, sensitivity, specificity, PPV, and NPV of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. CONCLUSIONS: We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP. JMIR Publications 2019-04-21 /pmc/articles/PMC6524454/ /pubmed/31066686 http://dx.doi.org/10.2196/12109 Text en ©Sunyang Fu, Lester Y Leung, Yanshan Wang, Anne-Olivia Raulli, David F Kallmes, Kristin A Kinsman, Kristoff B Nelson, Michael S Clark, Patrick H Luetmer, Paul R Kingsbury, David M Kent, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 21.04.2019. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Fu, Sunyang
Leung, Lester Y
Wang, Yanshan
Raulli, Anne-Olivia
Kallmes, David F
Kinsman, Kristin A
Nelson, Kristoff B
Clark, Michael S
Luetmer, Patrick H
Kingsbury, Paul R
Kent, David M
Liu, Hongfang
Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title_full Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title_fullStr Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title_full_unstemmed Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title_short Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports
title_sort natural language processing for the identification of silent brain infarcts from neuroimaging reports
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6524454/
https://www.ncbi.nlm.nih.gov/pubmed/31066686
http://dx.doi.org/10.2196/12109
work_keys_str_mv AT fusunyang naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT leunglestery naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT wangyanshan naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT raullianneolivia naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT kallmesdavidf naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT kinsmankristina naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT nelsonkristoffb naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT clarkmichaels naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT luetmerpatrickh naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT kingsburypaulr naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT kentdavidm naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports
AT liuhongfang naturallanguageprocessingfortheidentificationofsilentbraininfarctsfromneuroimagingreports