Cargando…

Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework

(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernández-Gutiérrez, Fabiola, Kennedy, Jonathan I., Cooksey, Roxanne, Atkinson, Mark, Choy, Ernest, Brophy, Sinead, Huo, Lin, Zhou, Shang-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534858/
https://www.ncbi.nlm.nih.gov/pubmed/34679609
http://dx.doi.org/10.3390/diagnostics11101908
_version_ 1784587644791947264
author Fernández-Gutiérrez, Fabiola
Kennedy, Jonathan I.
Cooksey, Roxanne
Atkinson, Mark
Choy, Ernest
Brophy, Sinead
Huo, Lin
Zhou, Shang-Ming
author_facet Fernández-Gutiérrez, Fabiola
Kennedy, Jonathan I.
Cooksey, Roxanne
Atkinson, Mark
Choy, Ernest
Brophy, Sinead
Huo, Lin
Zhou, Shang-Ming
author_sort Fernández-Gutiérrez, Fabiola
collection PubMed
description (1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process.
format Online
Article
Text
id pubmed-8534858
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85348582021-10-23 Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework Fernández-Gutiérrez, Fabiola Kennedy, Jonathan I. Cooksey, Roxanne Atkinson, Mark Choy, Ernest Brophy, Sinead Huo, Lin Zhou, Shang-Ming Diagnostics (Basel) Article (1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process. MDPI 2021-10-15 /pmc/articles/PMC8534858/ /pubmed/34679609 http://dx.doi.org/10.3390/diagnostics11101908 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Fernández-Gutiérrez, Fabiola
Kennedy, Jonathan I.
Cooksey, Roxanne
Atkinson, Mark
Choy, Ernest
Brophy, Sinead
Huo, Lin
Zhou, Shang-Ming
Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title_full Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title_fullStr Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title_full_unstemmed Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title_short Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
title_sort mining primary care electronic health records for automatic disease phenotyping: a transparent machine learning framework
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534858/
https://www.ncbi.nlm.nih.gov/pubmed/34679609
http://dx.doi.org/10.3390/diagnostics11101908
work_keys_str_mv AT fernandezgutierrezfabiola miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT kennedyjonathani miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT cookseyroxanne miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT atkinsonmark miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT choyernest miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT brophysinead miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT huolin miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework
AT zhoushangming miningprimarycareelectronichealthrecordsforautomaticdiseasephenotypingatransparentmachinelearningframework