Cargando…
Data mining to retrieve smoking status from electronic health records in general practice( )
AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 ...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/ https://www.ncbi.nlm.nih.gov/pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031 |
_version_ | 1784840794878771200 |
---|---|
author | de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia |
author_facet | de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia |
author_sort | de Boer, Annemarijn R |
collection | PubMed |
description | AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting. |
format | Online Article Text |
id | pubmed-9707867 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97078672023-01-27 Data mining to retrieve smoking status from electronic health records in general practice( ) de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia Eur Heart J Digit Health Original Article AIMS: Optimize and assess the performance of an existing data mining algorithm for smoking status from hospital electronic health records (EHRs) in general practice EHRs. METHODS AND RESULTS: We optimized an existing algorithm in a training set containing all clinical notes from 498 individuals (75 712 contact moments) from the Julius General Practitioners’ Network (JGPN). Each moment was classified as either ‘current smoker’, ‘former smoker’, ‘never smoker’, or ‘no information’. As a reference, we manually reviewed EHRs. Algorithm performance was assessed in an independent test set (n = 494, 78 129 moments) using precision, recall, and F1-score. Test set algorithm performance for ‘current smoker’ was precision 79.7%, recall 78.3%, and F1-score 0.79. For former smoker, it was precision 73.8%, recall 64.0%, and F1-score 0.69. For never smoker, it was precision 92.0%, recall 74.9%, and F1-score 0.83. On a patient level, performance for ever smoker (current and former smoker combined) was precision 87.9%, recall 94.7%, and F1-score 0.91. For never smoker, it was 98.0, 82.0, and 0.89%, respectively. We found a more narrative writing style in general practice than in hospital EHRs. CONCLUSION: Data mining can successfully retrieve smoking status information from general practice clinical notes with a good performance for classifying ever and never smokers. Differences between general practice and hospital EHRs call for optimization of data mining algorithms when applied beyond a primary development setting. Oxford University Press 2022-05-20 /pmc/articles/PMC9707867/ /pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the European Society of Cardiology. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Article de Boer, Annemarijn R de Groot, Mark C H Groenhof, T Katrien J van Doorn, Sander Vaartjes, Ilonca Bots, Michiel L Haitjema, Saskia Data mining to retrieve smoking status from electronic health records in general practice( ) |
title | Data mining to retrieve smoking status from electronic health records in general practice( ) |
title_full | Data mining to retrieve smoking status from electronic health records in general practice( ) |
title_fullStr | Data mining to retrieve smoking status from electronic health records in general practice( ) |
title_full_unstemmed | Data mining to retrieve smoking status from electronic health records in general practice( ) |
title_short | Data mining to retrieve smoking status from electronic health records in general practice( ) |
title_sort | data mining to retrieve smoking status from electronic health records in general practice( ) |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9707867/ https://www.ncbi.nlm.nih.gov/pubmed/36712169 http://dx.doi.org/10.1093/ehjdh/ztac031 |
work_keys_str_mv | AT deboerannemarijnr dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT degrootmarkch dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT groenhoftkatrienj dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT vandoornsander dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT vaartjesilonca dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT botsmichiell dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice AT haitjemasaskia dataminingtoretrievesmokingstatusfromelectronichealthrecordsingeneralpractice |