Cargando…

Data mining to retrieve smoking status from electronic health records in general practice( )

AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 ...

Descripción completa

Detalles Bibliográficos
Autores principales: de Boer, Annemarijn R, de Groot, Mark C H, Groenhof, T Katrien J, van Doorn, Sander, Vaartjes, Ilonca, Bots, Michiel L, Haitjema, Saskia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/
https://www.ncbi.nlm.nih.gov/pubmed/36712169
http://dx.doi.org/10.1093/ehjdh/ztac031
_version_ 1784840794878771200
author de Boer, Annemarijn R
de Groot, Mark C H
Groenhof, T Katrien J
van Doorn, Sander
Vaartjes, Ilonca
Bots, Michiel L
Haitjema, Saskia
author_facet de Boer, Annemarijn R
de Groot, Mark C H
Groenhof, T Katrien J
van Doorn, Sander
Vaartjes, Ilonca
Bots, Michiel L
Haitjema, Saskia
author_sort de Boer, Annemarijn R
collection PubMed
description AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting.
format Online
Article
Text
id pubmed-9707867
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97078672023-01-27 Data mining to retrieve smoking status from electronic health records in general practice( ) de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia Eur Heart J Digit Health Original Article AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting. Oxford University Press 2022-05-20 /pmc/articles/PMC9707867/ /pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the European Society of Cardiology. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Article
de Boer, Annemarijn R
de Groot, Mark C H
Groenhof, T Katrien J
van Doorn, Sander
Vaartjes, Ilonca
Bots, Michiel L
Haitjema, Saskia
Data mining to retrieve smoking status from electronic health records in general practice( )
title Data mining to retrieve smoking status from electronic health records in general practice( )
title_full Data mining to retrieve smoking status from electronic health records in general practice( )
title_fullStr Data mining to retrieve smoking status from electronic health records in general practice( )
title_full_unstemmed Data mining to retrieve smoking status from electronic health records in general practice( )
title_short Data mining to retrieve smoking status from electronic health records in general practice( )
title_sort data mining to retrieve smoking status from electronic health records in general practice( )
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/
https://www.ncbi.nlm.nih.gov/pubmed/36712169
http://dx.doi.org/10.1093/ehjdh/ztac031
work_keys_str_mv AT deboerannemarijnr dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT degrootmarkch dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT groenhoftkatrienj dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT vandoornsander dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT vaartjesilonca dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT botsmichiell dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice
AT haitjemasaskia dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice