Cargando…

The languages of health in general practice electronic patient records: a Zipf’s law analysis

BACKGROUND: Natural human languages show a power law behaviour in which word frequency (in any large enough corpus) is inversely proportional to word rank - Zipf’s law. We have therefore asked whether similar power law behaviours could be seen in data from electronic patient records. RESULTS: In ord...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalankesh, Leila R, New, John P, Baker, Patricia G, Brass, Andy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3944945/
https://www.ncbi.nlm.nih.gov/pubmed/24410884
http://dx.doi.org/10.1186/2041-1480-5-2
_version_ 1782306461306585088
author Kalankesh, Leila R
New, John P
Baker, Patricia G
Brass, Andy
author_facet Kalankesh, Leila R
New, John P
Baker, Patricia G
Brass, Andy
author_sort Kalankesh, Leila R
collection PubMed
description BACKGROUND: Natural human languages show a power law behaviour in which word frequency (in any large enough corpus) is inversely proportional to word rank - Zipf’s law. We have therefore asked whether similar power law behaviours could be seen in data from electronic patient records. RESULTS: In order to examine this question, anonymised data were obtained from all general practices in Salford covering a seven year period and captured in the form of Read codes. It was found that data for patient diagnoses and procedures followed Zipf’s law. However, the medication data behaved very differently, looking much more like a referential index. We also observed differences in the statistical behaviour of the language used to describe patient diagnosis as a function of an anonymised GP practice identifier. CONCLUSIONS: This works demonstrate that data from electronic patient records does follow Zipf’s law. We also found significant differences in Zipf’s law behaviour in data from different GP practices. This suggests that computational linguistic techniques could become a useful additional tool to help understand and monitor the data quality of health records.
format Online
Article
Text
id pubmed-3944945
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39449452014-03-17 The languages of health in general practice electronic patient records: a Zipf’s law analysis Kalankesh, Leila R New, John P Baker, Patricia G Brass, Andy J Biomed Semantics Research BACKGROUND: Natural human languages show a power law behaviour in which word frequency (in any large enough corpus) is inversely proportional to word rank - Zipf’s law. We have therefore asked whether similar power law behaviours could be seen in data from electronic patient records. RESULTS: In order to examine this question, anonymised data were obtained from all general practices in Salford covering a seven year period and captured in the form of Read codes. It was found that data for patient diagnoses and procedures followed Zipf’s law. However, the medication data behaved very differently, looking much more like a referential index. We also observed differences in the statistical behaviour of the language used to describe patient diagnosis as a function of an anonymised GP practice identifier. CONCLUSIONS: This works demonstrate that data from electronic patient records does follow Zipf’s law. We also found significant differences in Zipf’s law behaviour in data from different GP practices. This suggests that computational linguistic techniques could become a useful additional tool to help understand and monitor the data quality of health records. BioMed Central 2014-01-10 /pmc/articles/PMC3944945/ /pubmed/24410884 http://dx.doi.org/10.1186/2041-1480-5-2 Text en Copyright © 2014 Kalankesh et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kalankesh, Leila R
New, John P
Baker, Patricia G
Brass, Andy
The languages of health in general practice electronic patient records: a Zipf’s law analysis
title The languages of health in general practice electronic patient records: a Zipf’s law analysis
title_full The languages of health in general practice electronic patient records: a Zipf’s law analysis
title_fullStr The languages of health in general practice electronic patient records: a Zipf’s law analysis
title_full_unstemmed The languages of health in general practice electronic patient records: a Zipf’s law analysis
title_short The languages of health in general practice electronic patient records: a Zipf’s law analysis
title_sort languages of health in general practice electronic patient records: a zipf’s law analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3944945/
https://www.ncbi.nlm.nih.gov/pubmed/24410884
http://dx.doi.org/10.1186/2041-1480-5-2
work_keys_str_mv AT kalankeshleilar thelanguagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT newjohnp thelanguagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT bakerpatriciag thelanguagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT brassandy thelanguagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT kalankeshleilar languagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT newjohnp languagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT bakerpatriciag languagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis
AT brassandy languagesofhealthingeneralpracticeelectronicpatientrecordsazipfslawanalysis