Cargando…

Hospital-wide natural language processing summarising the health data of 1 million patients

Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR’s try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded o...

Descripción completa

Detalles Bibliográficos
Autores principales: Bean, Daniel M., Kraljevic, Zeljko, Shek, Anthony, Teo, James, Dobson, Richard J. B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168555/
https://www.ncbi.nlm.nih.gov/pubmed/37159441
http://dx.doi.org/10.1371/journal.pdig.0000218
_version_ 1785038877498540032
author Bean, Daniel M.
Kraljevic, Zeljko
Shek, Anthony
Teo, James
Dobson, Richard J. B.
author_facet Bean, Daniel M.
Kraljevic, Zeljko
Shek, Anthony
Teo, James
Dobson, Richard J. B.
author_sort Bean, Daniel M.
collection PubMed
description Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR’s try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King’s College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task.
format Online
Article
Text
id pubmed-10168555
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101685552023-05-10 Hospital-wide natural language processing summarising the health data of 1 million patients Bean, Daniel M. Kraljevic, Zeljko Shek, Anthony Teo, James Dobson, Richard J. B. PLOS Digit Health Research Article Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR’s try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King’s College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task. Public Library of Science 2023-05-09 /pmc/articles/PMC10168555/ /pubmed/37159441 http://dx.doi.org/10.1371/journal.pdig.0000218 Text en © 2023 Bean et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bean, Daniel M.
Kraljevic, Zeljko
Shek, Anthony
Teo, James
Dobson, Richard J. B.
Hospital-wide natural language processing summarising the health data of 1 million patients
title Hospital-wide natural language processing summarising the health data of 1 million patients
title_full Hospital-wide natural language processing summarising the health data of 1 million patients
title_fullStr Hospital-wide natural language processing summarising the health data of 1 million patients
title_full_unstemmed Hospital-wide natural language processing summarising the health data of 1 million patients
title_short Hospital-wide natural language processing summarising the health data of 1 million patients
title_sort hospital-wide natural language processing summarising the health data of 1 million patients
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168555/
https://www.ncbi.nlm.nih.gov/pubmed/37159441
http://dx.doi.org/10.1371/journal.pdig.0000218
work_keys_str_mv AT beandanielm hospitalwidenaturallanguageprocessingsummarisingthehealthdataof1millionpatients
AT kraljeviczeljko hospitalwidenaturallanguageprocessingsummarisingthehealthdataof1millionpatients
AT shekanthony hospitalwidenaturallanguageprocessingsummarisingthehealthdataof1millionpatients
AT teojames hospitalwidenaturallanguageprocessingsummarisingthehealthdataof1millionpatients
AT dobsonrichardjb hospitalwidenaturallanguageprocessingsummarisingthehealthdataof1millionpatients