Cargando…

Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke

BACKGROUND: Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to inve...

Descripción completa

Detalles Bibliográficos
Autores principales: Rannikmäe, Kristiina, Wu, Honghan, Tominey, Steven, Whiteley, William, Allen, Naomi, Sudlow, Cathie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204419/
https://www.ncbi.nlm.nih.gov/pubmed/34130677
http://dx.doi.org/10.1186/s12911-021-01556-0
_version_ 1783708336357638144
author Rannikmäe, Kristiina
Wu, Honghan
Tominey, Steven
Whiteley, William
Allen, Naomi
Sudlow, Cathie
author_facet Rannikmäe, Kristiina
Wu, Honghan
Tominey, Steven
Whiteley, William
Allen, Naomi
Sudlow, Cathie
author_sort Rannikmäe, Kristiina
collection PubMed
description BACKGROUND: Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping. METHODS: From a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and ≥ 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels. RESULTS: Of 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype. CONCLUSIONS: Our automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01556-0.
format Online
Article
Text
id pubmed-8204419
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82044192021-06-16 Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke Rannikmäe, Kristiina Wu, Honghan Tominey, Steven Whiteley, William Allen, Naomi Sudlow, Cathie BMC Med Inform Decis Mak Research BACKGROUND: Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping. METHODS: From a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and ≥ 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels. RESULTS: Of 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype. CONCLUSIONS: Our automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01556-0. BioMed Central 2021-06-15 /pmc/articles/PMC8204419/ /pubmed/34130677 http://dx.doi.org/10.1186/s12911-021-01556-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Rannikmäe, Kristiina
Wu, Honghan
Tominey, Steven
Whiteley, William
Allen, Naomi
Sudlow, Cathie
Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title_full Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title_fullStr Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title_full_unstemmed Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title_short Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
title_sort developing automated methods for disease subtyping in uk biobank: an exemplar study on stroke
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204419/
https://www.ncbi.nlm.nih.gov/pubmed/34130677
http://dx.doi.org/10.1186/s12911-021-01556-0
work_keys_str_mv AT rannikmaekristiina developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT wuhonghan developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT tomineysteven developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT whiteleywilliam developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT allennaomi developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT sudlowcathie developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke
AT developingautomatedmethodsfordiseasesubtypinginukbiobankanexemplarstudyonstroke