Cargando…

RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and tre...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Ji-Sung, Gao, Xin, Rzhetsky, Andrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940243/
https://www.ncbi.nlm.nih.gov/pubmed/29698408
http://dx.doi.org/10.1371/journal.pcbi.1006106
_version_ 1783321079362617344
author Kim, Ji-Sung
Gao, Xin
Rzhetsky, Andrey
author_facet Kim, Ji-Sung
Gao, Xin
Rzhetsky, Andrey
author_sort Kim, Ji-Sung
collection PubMed
description Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10(−9)). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases.
format Online
Article
Text
id pubmed-5940243
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59402432018-05-18 RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning Kim, Ji-Sung Gao, Xin Rzhetsky, Andrey PLoS Comput Biol Research Article Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest, support vector machines, and gradient-boosted decision trees). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), precision, recall, and area under the curve for receiver operating characteristic plots (all p < 10(−9)). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases. Public Library of Science 2018-04-26 /pmc/articles/PMC5940243/ /pubmed/29698408 http://dx.doi.org/10.1371/journal.pcbi.1006106 Text en © 2018 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Ji-Sung
Gao, Xin
Rzhetsky, Andrey
RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title_full RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title_fullStr RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title_full_unstemmed RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title_short RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning
title_sort riddle: race and ethnicity imputation from disease history with deep learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5940243/
https://www.ncbi.nlm.nih.gov/pubmed/29698408
http://dx.doi.org/10.1371/journal.pcbi.1006106
work_keys_str_mv AT kimjisung riddleraceandethnicityimputationfromdiseasehistorywithdeeplearning
AT gaoxin riddleraceandethnicityimputationfromdiseasehistorywithdeeplearning
AT rzhetskyandrey riddleraceandethnicityimputationfromdiseasehistorywithdeeplearning