Cargando…

Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records

AIMS: To develop and validate a machine learning (ML) algorithm to identify undiagnosed hepatitis C virus (HCV) patients, in order to facilitate prioritisation of patients for targeted HCV screening. METHODS: This retrospective study used ambulatory electronic medical records (EMR) from January 2015...

Descripción completa

Detalles Bibliográficos
Autores principales: Rigg, John, Doyle, Orla, McDonogh, Niamh, Leavitt, Nadea, Ali, Rehan, Son, Annie, Kreter, Bruce
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9843171/
https://www.ncbi.nlm.nih.gov/pubmed/36639190
http://dx.doi.org/10.1136/bmjhci-2022-100651
_version_ 1784870326027419648
author Rigg, John
Doyle, Orla
McDonogh, Niamh
Leavitt, Nadea
Ali, Rehan
Son, Annie
Kreter, Bruce
author_facet Rigg, John
Doyle, Orla
McDonogh, Niamh
Leavitt, Nadea
Ali, Rehan
Son, Annie
Kreter, Bruce
author_sort Rigg, John
collection PubMed
description AIMS: To develop and validate a machine learning (ML) algorithm to identify undiagnosed hepatitis C virus (HCV) patients, in order to facilitate prioritisation of patients for targeted HCV screening. METHODS: This retrospective study used ambulatory electronic medical records (EMR) from January 2015 to February 2020. A Gradient Boosting Trees algorithm was trained using patient records to predict initial HCV diagnosis and was validated on a temporally independent held-out cross-section of the data. The fold improvement in precision (proportion of patients identified by the algorithm who are HCV positive) over universal screening was examined and compared with risk-based screening. RESULTS: 21 508 positive (HCV diagnosed) and 28.2M unlabelled (lacking evidence of HCV diagnosis) patients met the inclusion criteria for the study. After down-sampling unlabelled patients to aid the algorithm’s learning process, 16.2M unlabelled patients entered the analysis. Performance of the algorithm was compared with universal screening on the held-out cross-section, which had an incidence of HCV diagnoses of 0.02%. The algorithm achieved a 101.0 ×, 18.0 × and 5.1 × fold improvement in precision over universal screening at 5%, 20% and 50% levels of recall. When compared with risk-based screening, the algorithm required fewer patients to be screened and improved precision. CONCLUSIONS: This study presents strong evidence towards the use of ML on EMR data for the prioritisation of patients for targeted HCV testing with potential to improve efficiency of resource utilisation, thereby reducing the workload for clinicians and saving healthcare costs. A prospective interventional study would allow for further validation before use in a clinical setting.
format Online
Article
Text
id pubmed-9843171
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-98431712023-01-18 Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records Rigg, John Doyle, Orla McDonogh, Niamh Leavitt, Nadea Ali, Rehan Son, Annie Kreter, Bruce BMJ Health Care Inform Original Research AIMS: To develop and validate a machine learning (ML) algorithm to identify undiagnosed hepatitis C virus (HCV) patients, in order to facilitate prioritisation of patients for targeted HCV screening. METHODS: This retrospective study used ambulatory electronic medical records (EMR) from January 2015 to February 2020. A Gradient Boosting Trees algorithm was trained using patient records to predict initial HCV diagnosis and was validated on a temporally independent held-out cross-section of the data. The fold improvement in precision (proportion of patients identified by the algorithm who are HCV positive) over universal screening was examined and compared with risk-based screening. RESULTS: 21 508 positive (HCV diagnosed) and 28.2M unlabelled (lacking evidence of HCV diagnosis) patients met the inclusion criteria for the study. After down-sampling unlabelled patients to aid the algorithm’s learning process, 16.2M unlabelled patients entered the analysis. Performance of the algorithm was compared with universal screening on the held-out cross-section, which had an incidence of HCV diagnoses of 0.02%. The algorithm achieved a 101.0 ×, 18.0 × and 5.1 × fold improvement in precision over universal screening at 5%, 20% and 50% levels of recall. When compared with risk-based screening, the algorithm required fewer patients to be screened and improved precision. CONCLUSIONS: This study presents strong evidence towards the use of ML on EMR data for the prioritisation of patients for targeted HCV testing with potential to improve efficiency of resource utilisation, thereby reducing the workload for clinicians and saving healthcare costs. A prospective interventional study would allow for further validation before use in a clinical setting. BMJ Publishing Group 2023-01-13 /pmc/articles/PMC9843171/ /pubmed/36639190 http://dx.doi.org/10.1136/bmjhci-2022-100651 Text en © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Original Research
Rigg, John
Doyle, Orla
McDonogh, Niamh
Leavitt, Nadea
Ali, Rehan
Son, Annie
Kreter, Bruce
Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title_full Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title_fullStr Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title_full_unstemmed Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title_short Finding undiagnosed patients with hepatitis C virus: an application of machine learning to US ambulatory electronic medical records
title_sort finding undiagnosed patients with hepatitis c virus: an application of machine learning to us ambulatory electronic medical records
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9843171/
https://www.ncbi.nlm.nih.gov/pubmed/36639190
http://dx.doi.org/10.1136/bmjhci-2022-100651
work_keys_str_mv AT riggjohn findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT doyleorla findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT mcdonoghniamh findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT leavittnadea findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT alirehan findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT sonannie findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords
AT kreterbruce findingundiagnosedpatientswithhepatitiscvirusanapplicationofmachinelearningtousambulatoryelectronicmedicalrecords