Cargando…
POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study
OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Bio...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846671/ https://www.ncbi.nlm.nih.gov/pubmed/36469791 http://dx.doi.org/10.1093/jamia/ocac226 |
_version_ | 1784871244657590272 |
---|---|
author | Yang, Lu Wang, Sheng Altman, Russ B |
author_facet | Yang, Lu Wang, Sheng Altman, Russ B |
author_sort | Yang, Lu |
collection | PubMed |
description | OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. MATERIALS AND METHODS: POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1538 phenotype codes. We extracted phenotypic and health-related information of 392 246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12 803 ICD-10 diagnosis codes of the patients were converted to 1538 phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multiphenotype recognition. RESULTS: POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multiphenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype. CONCLUSIONS: POPDx helps provide well-defined cohorts for downstream studies. It is a general-purpose method that can be applied to other biobanks with diverse but incomplete data. |
format | Online Article Text |
id | pubmed-9846671 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98466712023-01-20 POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study Yang, Lu Wang, Sheng Altman, Russ B J Am Med Inform Assoc Research and Applications OBJECTIVE: For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants. MATERIALS AND METHODS: POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1538 phenotype codes. We extracted phenotypic and health-related information of 392 246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12 803 ICD-10 diagnosis codes of the patients were converted to 1538 phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multiphenotype recognition. RESULTS: POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multiphenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype. CONCLUSIONS: POPDx helps provide well-defined cohorts for downstream studies. It is a general-purpose method that can be applied to other biobanks with diverse but incomplete data. Oxford University Press 2022-12-05 /pmc/articles/PMC9846671/ /pubmed/36469791 http://dx.doi.org/10.1093/jamia/ocac226 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Yang, Lu Wang, Sheng Altman, Russ B POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title | POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title_full | POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title_fullStr | POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title_full_unstemmed | POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title_short | POPDx: an automated framework for patient phenotyping across 392 246 individuals in the UK Biobank study |
title_sort | popdx: an automated framework for patient phenotyping across 392 246 individuals in the uk biobank study |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9846671/ https://www.ncbi.nlm.nih.gov/pubmed/36469791 http://dx.doi.org/10.1093/jamia/ocac226 |
work_keys_str_mv | AT yanglu popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy AT wangsheng popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy AT altmanrussb popdxanautomatedframeworkforpatientphenotypingacross392246individualsintheukbiobankstudy |