Cargando…

Coding and classifying GP data: the POLAR project

BACKGROUND: Data, particularly ‘big’ data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data. OBJECTIVE: To design a suitable, accurate method for converting large volumes of narrative d...

Descripción completa

Detalles Bibliográficos
Autores principales: Pearce, Christopher, McLeod, Adam, Patrick, Jon, Ferrigi, Jason, Bainbridge, Michael Michael, Rinehart, Natalie, Fragkoudi, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7252962/
https://www.ncbi.nlm.nih.gov/pubmed/31712272
http://dx.doi.org/10.1136/bmjhci-2019-100009
_version_ 1783539257662504960
author Pearce, Christopher
McLeod, Adam
Patrick, Jon
Ferrigi, Jason
Bainbridge, Michael Michael
Rinehart, Natalie
Fragkoudi, Anna
author_facet Pearce, Christopher
McLeod, Adam
Patrick, Jon
Ferrigi, Jason
Bainbridge, Michael Michael
Rinehart, Natalie
Fragkoudi, Anna
author_sort Pearce, Christopher
collection PubMed
description BACKGROUND: Data, particularly ‘big’ data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data. OBJECTIVE: To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes. METHOD: The developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics. RESULTS: Coding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere.
format Online
Article
Text
id pubmed-7252962
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-72529622020-09-30 Coding and classifying GP data: the POLAR project Pearce, Christopher McLeod, Adam Patrick, Jon Ferrigi, Jason Bainbridge, Michael Michael Rinehart, Natalie Fragkoudi, Anna BMJ Health Care Inform Original Research BACKGROUND: Data, particularly ‘big’ data are increasingly being used for research in health. Using data from electronic medical records optimally requires coded data, but not all systems produce coded data. OBJECTIVE: To design a suitable, accurate method for converting large volumes of narrative diagnoses from Australian general practice records to codify them into SNOMED-CT-AU. Such codification will make them clinically useful for aggregation for population health and research purposes. METHOD: The developed method consisted of using natural language processing to automatically code the texts, followed by a manual process to correct codes and subsequent natural language processing re-computation. These steps were repeated for four iterations until 95% of the records were coded. The coded data were then aggregated into classes considered to be useful for population health analytics. RESULTS: Coding the data effectively covered 95% of the corpus. Problems with the use of SNOMED CT-AU were identified and protocols for creating consistent coding were created. These protocols can be used to guide further development of SNOMED CT-AU (SCT). The coded values will be immensely useful for the development of population health analytics for Australia, and the lessons learnt applicable elsewhere. BMJ Publishing Group 2019-11-10 /pmc/articles/PMC7252962/ /pubmed/31712272 http://dx.doi.org/10.1136/bmjhci-2019-100009 Text en © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Original Research
Pearce, Christopher
McLeod, Adam
Patrick, Jon
Ferrigi, Jason
Bainbridge, Michael Michael
Rinehart, Natalie
Fragkoudi, Anna
Coding and classifying GP data: the POLAR project
title Coding and classifying GP data: the POLAR project
title_full Coding and classifying GP data: the POLAR project
title_fullStr Coding and classifying GP data: the POLAR project
title_full_unstemmed Coding and classifying GP data: the POLAR project
title_short Coding and classifying GP data: the POLAR project
title_sort coding and classifying gp data: the polar project
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7252962/
https://www.ncbi.nlm.nih.gov/pubmed/31712272
http://dx.doi.org/10.1136/bmjhci-2019-100009
work_keys_str_mv AT pearcechristopher codingandclassifyinggpdatathepolarproject
AT mcleodadam codingandclassifyinggpdatathepolarproject
AT patrickjon codingandclassifyinggpdatathepolarproject
AT ferrigijason codingandclassifyinggpdatathepolarproject
AT bainbridgemichaelmichael codingandclassifyinggpdatathepolarproject
AT rinehartnatalie codingandclassifyinggpdatathepolarproject
AT fragkoudianna codingandclassifyinggpdatathepolarproject