Cargando…

A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records

BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). ME...

Descripción completa

Detalles Bibliográficos
Autores principales: Wheater, Emily, Mair, Grant, Sudlow, Cathie, Alex, Beatrice, Grover, Claire, Whiteley, William
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6734359/
https://www.ncbi.nlm.nih.gov/pubmed/31500613
http://dx.doi.org/10.1186/s12911-019-0908-7
_version_ 1783450139515420672
author Wheater, Emily
Mair, Grant
Sudlow, Cathie
Alex, Beatrice
Grover, Claire
Whiteley, William
author_facet Wheater, Emily
Mair, Grant
Sudlow, Cathie
Alex, Beatrice
Grover, Claire
Whiteley, William
author_sort Wheater, Emily
collection PubMed
description BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. RESULTS: The agreement between expert readers was excellent (Cohen’s κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81–94); positive predictive value (PPV) 85% (76–90); specificity 100% (95% CI:0.99–1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80–99), PPV 72% (95% CI:55–84); specificity 100% (95% CI:0.99–1.00)]; brain tumours [sensitivity 96% (CI:87–99); PPV 84% (73–91); specificity: 100% (95% CI:0.99–1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.
format Online
Article
Text
id pubmed-6734359
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67343592019-09-12 A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records Wheater, Emily Mair, Grant Sudlow, Cathie Alex, Beatrice Grover, Claire Whiteley, William BMC Med Inform Decis Mak Research Article BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. RESULTS: The agreement between expert readers was excellent (Cohen’s κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81–94); positive predictive value (PPV) 85% (76–90); specificity 100% (95% CI:0.99–1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80–99), PPV 72% (95% CI:55–84); specificity 100% (95% CI:0.99–1.00)]; brain tumours [sensitivity 96% (CI:87–99); PPV 84% (73–91); specificity: 100% (95% CI:0.99–1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible. BioMed Central 2019-09-09 /pmc/articles/PMC6734359/ /pubmed/31500613 http://dx.doi.org/10.1186/s12911-019-0908-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wheater, Emily
Mair, Grant
Sudlow, Cathie
Alex, Beatrice
Grover, Claire
Whiteley, William
A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title_full A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title_fullStr A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title_full_unstemmed A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title_short A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records
title_sort validated natural language processing algorithm for brain imaging phenotypes from radiology reports in uk electronic health records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6734359/
https://www.ncbi.nlm.nih.gov/pubmed/31500613
http://dx.doi.org/10.1186/s12911-019-0908-7
work_keys_str_mv AT wheateremily avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT mairgrant avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT sudlowcathie avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT alexbeatrice avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT groverclaire avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT whiteleywilliam avalidatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT wheateremily validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT mairgrant validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT sudlowcathie validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT alexbeatrice validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT groverclaire validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords
AT whiteleywilliam validatednaturallanguageprocessingalgorithmforbrainimagingphenotypesfromradiologyreportsinukelectronichealthrecords