Cargando…

POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study

OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Bio...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Lu, Wang, Sheng, Altman, Russ B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846671/
https://www.ncbi.nlm.nih.gov/pubmed/36469791
http://dx.doi.org/10.1093/jamia/ocac226
_version_ 1784871244657590272
author Yang, Lu
Wang, Sheng
Altman, Russ B
author_facet Yang, Lu
Wang, Sheng
Altman, Russ B
author_sort Yang, Lu
collection PubMed
description OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. MATERIALS AND METHODS: POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1538 phenotype codes. We extracted phenotypic and health-related information of 392 246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12 803 ICD-10 diagnosis codes of the patients were converted to 1538 phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multiphenotype recognition. RESULTS: POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multiphenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype. CONCLUSIONS: POPDx helps provide well-defined cohorts for downstream studies. It is a general-purpose method that can be applied to other biobanks with diverse but incomplete data.
format Online
Article
Text
id pubmed-9846671
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98466712023-01-20 POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study Yang, Lu Wang, Sheng Altman, Russ B J Am Med Inform Assoc Research and Applications OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. MATERIALS AND METHODS: POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1538 phenotype codes. We extracted phenotypic and health-related information of 392 246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12 803 ICD-10 diagnosis codes of the patients were converted to 1538 phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multiphenotype recognition. RESULTS: POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multiphenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype. CONCLUSIONS: POPDx helps provide well-defined cohorts for downstream studies. It is a general-purpose method that can be applied to other biobanks with diverse but incomplete data. Oxford University Press 2022-12-05 /pmc/articles/PMC9846671/ /pubmed/36469791 http://dx.doi.org/10.1093/jamia/ocac226 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Yang, Lu
Wang, Sheng
Altman, Russ B
POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title_full POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title_fullStr POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title_full_unstemmed POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title_short POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
title_sort popdx: an automated framework for patient phenotyping across 392 246 individuals in the uk biobank study
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846671/
https://www.ncbi.nlm.nih.gov/pubmed/36469791
http://dx.doi.org/10.1093/jamia/ocac226
work_keys_str_mv AT yanglu popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy
AT wangsheng popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy
AT altmanrussb popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy