Cargando…

Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records

Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithm...

Descripción completa

Detalles Bibliográficos
Autores principales: De Freitas, Jessica K., Johnson, Kipp W., Golden, Eddye, Nadkarni, Girish N., Dudley, Joel T., Bottinger, Erwin P., Glicksberg, Benjamin S., Miotto, Riccardo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8441576/
https://www.ncbi.nlm.nih.gov/pubmed/34553174
http://dx.doi.org/10.1016/j.patter.2021.100337
_version_ 1783752896837320704
author De Freitas, Jessica K.
Johnson, Kipp W.
Golden, Eddye
Nadkarni, Girish N.
Dudley, Joel T.
Bottinger, Erwin P.
Glicksberg, Benjamin S.
Miotto, Riccardo
author_facet De Freitas, Jessica K.
Johnson, Kipp W.
Golden, Eddye
Nadkarni, Girish N.
Dudley, Joel T.
Bottinger, Erwin P.
Glicksberg, Benjamin S.
Miotto, Riccardo
author_sort De Freitas, Jessica K.
collection PubMed
description Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts.
format Online
Article
Text
id pubmed-8441576
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-84415762021-09-21 Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records De Freitas, Jessica K. Johnson, Kipp W. Golden, Eddye Nadkarni, Girish N. Dudley, Joel T. Bottinger, Erwin P. Glicksberg, Benjamin S. Miotto, Riccardo Patterns (N Y) Descriptor Robust phenotyping of patients from electronic health records (EHRs) at scale is a challenge in clinical informatics. Here, we introduce Phe2vec, an automated framework for disease phenotyping from EHRs based on unsupervised learning and assess its effectiveness against standard rule-based algorithms from Phenotype KnowledgeBase (PheKB). Phe2vec is based on pre-computing embeddings of medical concepts and patients' clinical history. Disease phenotypes are then derived from a seed concept and its neighbors in the embedding space. Patients are linked to a disease if their embedded representation is close to the disease phenotype. Comparing Phe2vec and PheKB cohorts head-to-head using chart review, Phe2vec performed on par or better in nine out of ten diseases. Differently from other approaches, it can scale to any condition and was validated against widely adopted expert-based standards. Phe2vec aims to optimize clinical informatics research by augmenting current frameworks to characterize patients by condition and derive reliable disease cohorts. Elsevier 2021-09-02 /pmc/articles/PMC8441576/ /pubmed/34553174 http://dx.doi.org/10.1016/j.patter.2021.100337 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Descriptor
De Freitas, Jessica K.
Johnson, Kipp W.
Golden, Eddye
Nadkarni, Girish N.
Dudley, Joel T.
Bottinger, Erwin P.
Glicksberg, Benjamin S.
Miotto, Riccardo
Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title_full Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title_fullStr Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title_full_unstemmed Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title_short Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records
title_sort phe2vec: automated disease phenotyping based on unsupervised embeddings from electronic health records
topic Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8441576/
https://www.ncbi.nlm.nih.gov/pubmed/34553174
http://dx.doi.org/10.1016/j.patter.2021.100337
work_keys_str_mv AT defreitasjessicak phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT johnsonkippw phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT goldeneddye phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT nadkarnigirishn phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT dudleyjoelt phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT bottingererwinp phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT glicksbergbenjamins phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords
AT miottoriccardo phe2vecautomateddiseasephenotypingbasedonunsupervisedembeddingsfromelectronichealthrecords