Cargando…
Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs
A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised el...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/ https://www.ncbi.nlm.nih.gov/pubmed/34882714 http://dx.doi.org/10.1371/journal.pone.0260402 |
_version_ | 1784613005837729792 |
---|---|
author | Noble, Peter-John Mäntylä Appleton, Charlotte Radford, Alan David Nenadic, Goran |
author_facet | Noble, Peter-John Mäntylä Appleton, Charlotte Radford, Alan David Nenadic, Goran |
author_sort | Noble, Peter-John Mäntylä |
collection | PubMed |
description | A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the ‘gastroenteric’ MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived ‘gastroenteric’ MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives. |
format | Online Article Text |
id | pubmed-8659617 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-86596172021-12-10 Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs Noble, Peter-John Mäntylä Appleton, Charlotte Radford, Alan David Nenadic, Goran PLoS One Research Article A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the ‘gastroenteric’ MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived ‘gastroenteric’ MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives. Public Library of Science 2021-12-09 /pmc/articles/PMC8659617/ /pubmed/34882714 http://dx.doi.org/10.1371/journal.pone.0260402 Text en © 2021 Noble et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Noble, Peter-John Mäntylä Appleton, Charlotte Radford, Alan David Nenadic, Goran Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title_full | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title_fullStr | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title_full_unstemmed | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title_short | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
title_sort | using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in uk dogs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/ https://www.ncbi.nlm.nih.gov/pubmed/34882714 http://dx.doi.org/10.1371/journal.pone.0260402 |
work_keys_str_mv | AT noblepeterjohnmantyla usingtopicmodellingforunsupervisedannotationofelectronichealthrecordstoidentifyanoutbreakofdiseaseinukdogs AT appletoncharlotte usingtopicmodellingforunsupervisedannotationofelectronichealthrecordstoidentifyanoutbreakofdiseaseinukdogs AT radfordalandavid usingtopicmodellingforunsupervisedannotationofelectronichealthrecordstoidentifyanoutbreakofdiseaseinukdogs AT nenadicgoran usingtopicmodellingforunsupervisedannotationofelectronichealthrecordstoidentifyanoutbreakofdiseaseinukdogs |