Cargando…

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)

BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Tianle, Zhang, Aidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/
https://www.ncbi.nlm.nih.gov/pubmed/31856727
http://dx.doi.org/10.1186/s12864-019-6285-x
_version_ 1783481598705926144
author Ma, Tianle
Zhang, Aidong
author_facet Ma, Tianle
Zhang, Aidong
author_sort Ma, Tianle
collection PubMed
description BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.
format Online
Article
Text
id pubmed-6923820
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69238202019-12-30 Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) Ma, Tianle Zhang, Aidong BMC Genomics Research BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features. BioMed Central 2019-12-20 /pmc/articles/PMC6923820/ /pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ma, Tianle
Zhang, Aidong
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_full Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_fullStr Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_full_unstemmed Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_short Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_sort integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/
https://www.ncbi.nlm.nih.gov/pubmed/31856727
http://dx.doi.org/10.1186/s12864-019-6285-x
work_keys_str_mv AT matianle integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae
AT zhangaidong integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae