Cargando…

NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature...

Descripción completa

Detalles Bibliográficos
Autores principales: Martínez-Enguita, David, Dwivedi, Sanjiv K, Jörnsten, Rebecka, Gustafsson, Mika
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516364/
https://www.ncbi.nlm.nih.gov/pubmed/37587790
http://dx.doi.org/10.1093/bib/bbad293
_version_ 1785109114683129856
author Martínez-Enguita, David
Dwivedi, Sanjiv K
Jörnsten, Rebecka
Gustafsson, Mika
author_facet Martínez-Enguita, David
Dwivedi, Sanjiv K
Jörnsten, Rebecka
Gustafsson, Mika
author_sort Martínez-Enguita, David
collection PubMed
description Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.
format Online
Article
Text
id pubmed-10516364
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105163642023-09-23 NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures Martínez-Enguita, David Dwivedi, Sanjiv K Jörnsten, Rebecka Gustafsson, Mika Brief Bioinform Problem Solving Protocol Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies. Oxford University Press 2023-08-16 /pmc/articles/PMC10516364/ /pubmed/37587790 http://dx.doi.org/10.1093/bib/bbad293 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Martínez-Enguita, David
Dwivedi, Sanjiv K
Jörnsten, Rebecka
Gustafsson, Mika
NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title_full NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title_fullStr NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title_full_unstemmed NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title_short NCAE: data-driven representations using a deep network-coherent DNA methylation autoencoder identify robust disease and risk factor signatures
title_sort ncae: data-driven representations using a deep network-coherent dna methylation autoencoder identify robust disease and risk factor signatures
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516364/
https://www.ncbi.nlm.nih.gov/pubmed/37587790
http://dx.doi.org/10.1093/bib/bbad293
work_keys_str_mv AT martinezenguitadavid ncaedatadrivenrepresentationsusingadeepnetworkcoherentdnamethylationautoencoderidentifyrobustdiseaseandriskfactorsignatures
AT dwivedisanjivk ncaedatadrivenrepresentationsusingadeepnetworkcoherentdnamethylationautoencoderidentifyrobustdiseaseandriskfactorsignatures
AT jornstenrebecka ncaedatadrivenrepresentationsusingadeepnetworkcoherentdnamethylationautoencoderidentifyrobustdiseaseandriskfactorsignatures
AT gustafssonmika ncaedatadrivenrepresentationsusingadeepnetworkcoherentdnamethylationautoencoderidentifyrobustdiseaseandriskfactorsignatures