Cargando…

Data mining to retrieve smoking status from electronic health records in general practice( )

AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 ...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Boer, Annemarijn R, de Groot, Mark C H, Groenhof, T Katrien J, van Doorn, Sander, Vaartjes, Ilonca, Bots, Michiel L, Haitjema, Saskia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/ https://www.ncbi.nlm.nih.gov/pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031

_version_	1784840794878771200
author	de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia
author_facet	de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia
author_sort	de Boer, Annemarijn R
collection	PubMed
description	AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting.
format	Online Article Text
id	pubmed-9707867
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-97078672023-01-27 Data mining to retrieve smoking status from electronic health records in general practice( ) de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia Eur Heart J Digit Health Original Article AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting. Oxford University Press 2022-05-20 /pmc/articles/PMC9707867/ /pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the European Society of Cardiology. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Article de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia Data mining to retrieve smoking status from electronic health records in general practice( )
title	Data mining to retrieve smoking status from electronic health records in general practice( )
title_full	Data mining to retrieve smoking status from electronic health records in general practice( )
title_fullStr	Data mining to retrieve smoking status from electronic health records in general practice( )
title_full_unstemmed	Data mining to retrieve smoking status from electronic health records in general practice( )
title_short	Data mining to retrieve smoking status from electronic health records in general practice( )
title_sort	data mining to retrieve smoking status from electronic health records in general practice( )
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/ https://www.ncbi.nlm.nih.gov/pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031
work_keys_str_mv	AT deboerannemarijnr dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT degrootmarkch dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT groenhoftkatrienj dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT vandoornsander dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT vaartjesilonca dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT botsmichiell dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT haitjemasaskia dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice

Data mining to retrieve smoking status from electronic health records in general practice( )

Ejemplares similares