Cargando…

Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction

OBJECTIVES: We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders o...

Descripción completa

Detalles Bibliográficos
Autores principales: Jones, Barrett W, Taylor, Warren D, Walsh, Colin G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561992/
https://www.ncbi.nlm.nih.gov/pubmed/37818308
http://dx.doi.org/10.1093/jamiaopen/ooad086
_version_ 1785118032173989888
author Jones, Barrett W
Taylor, Warren D
Walsh, Colin G
author_facet Jones, Barrett W
Taylor, Warren D
Walsh, Colin G
author_sort Jones, Barrett W
collection PubMed
description OBJECTIVES: We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark. MATERIALS AND METHODS: We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population’s treatment course. RESULTS: The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005). DISCUSSION: Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies. CONCLUSION: LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights.
format Online
Article
Text
id pubmed-10561992
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105619922023-10-10 Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction Jones, Barrett W Taylor, Warren D Walsh, Colin G JAMIA Open Research and Applications OBJECTIVES: We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark. MATERIALS AND METHODS: We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population’s treatment course. RESULTS: The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005). DISCUSSION: Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies. CONCLUSION: LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights. Oxford University Press 2023-10-09 /pmc/articles/PMC10561992/ /pubmed/37818308 http://dx.doi.org/10.1093/jamiaopen/ooad086 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Jones, Barrett W
Taylor, Warren D
Walsh, Colin G
Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title_full Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title_fullStr Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title_full_unstemmed Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title_short Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
title_sort sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561992/
https://www.ncbi.nlm.nih.gov/pubmed/37818308
http://dx.doi.org/10.1093/jamiaopen/ooad086
work_keys_str_mv AT jonesbarrettw sequentialautoencodersforfeatureengineeringandpretraininginmajordepressivedisorderriskprediction
AT taylorwarrend sequentialautoencodersforfeatureengineeringandpretraininginmajordepressivedisorderriskprediction
AT walshcoling sequentialautoencodersforfeatureengineeringandpretraininginmajordepressivedisorderriskprediction