Cargando…

A framework for feature extraction from hospital medical data with applications in risk prediction

BACKGROUND: Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tran, Truyen, Luo, Wei, Phung, Dinh, Gupta, Sunil, Rana, Santu, Kennedy, Richard Lee, Larkins, Ann, Venkatesh, Svetha
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310185/ https://www.ncbi.nlm.nih.gov/pubmed/25547173 http://dx.doi.org/10.1186/s12859-014-0425-8

_version_	1782354825402384384
author	Tran, Truyen Luo, Wei Phung, Dinh Gupta, Sunil Rana, Santu Kennedy, Richard Lee Larkins, Ann Venkatesh, Svetha
author_facet	Tran, Truyen Luo, Wei Phung, Dinh Gupta, Sunil Rana, Santu Kennedy, Richard Lee Larkins, Ann Venkatesh, Svetha
author_sort	Tran, Truyen
collection	PubMed
description	BACKGROUND: Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. RESULTS: Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD—baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes—baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders—baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia—baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). CONCLUSIONS: The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0425-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4310185
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43101852015-02-03 A framework for feature extraction from hospital medical data with applications in risk prediction Tran, Truyen Luo, Wei Phung, Dinh Gupta, Sunil Rana, Santu Kennedy, Richard Lee Larkins, Ann Venkatesh, Svetha BMC Bioinformatics Research Article BACKGROUND: Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. RESULTS: Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD—baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes—baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders—baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia—baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). CONCLUSIONS: The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0425-8) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-30 /pmc/articles/PMC4310185/ /pubmed/25547173 http://dx.doi.org/10.1186/s12859-014-0425-8 Text en © Tran et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Tran, Truyen Luo, Wei Phung, Dinh Gupta, Sunil Rana, Santu Kennedy, Richard Lee Larkins, Ann Venkatesh, Svetha A framework for feature extraction from hospital medical data with applications in risk prediction
title	A framework for feature extraction from hospital medical data with applications in risk prediction
title_full	A framework for feature extraction from hospital medical data with applications in risk prediction
title_fullStr	A framework for feature extraction from hospital medical data with applications in risk prediction
title_full_unstemmed	A framework for feature extraction from hospital medical data with applications in risk prediction
title_short	A framework for feature extraction from hospital medical data with applications in risk prediction
title_sort	framework for feature extraction from hospital medical data with applications in risk prediction
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310185/ https://www.ncbi.nlm.nih.gov/pubmed/25547173 http://dx.doi.org/10.1186/s12859-014-0425-8
work_keys_str_mv	AT trantruyen aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT luowei aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT phungdinh aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT guptasunil aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT ranasantu aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT kennedyrichardlee aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT larkinsann aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT venkateshsvetha aframeworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT trantruyen frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT luowei frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT phungdinh frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT guptasunil frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT ranasantu frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT kennedyrichardlee frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT larkinsann frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction AT venkateshsvetha frameworkforfeatureextractionfromhospitalmedicaldatawithapplicationsinriskprediction

A framework for feature extraction from hospital medical data with applications in risk prediction

Ejemplares similares