Cargando…

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)

BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link differ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ma, Tianle, Zhang, Aidong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/ https://www.ncbi.nlm.nih.gov/pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x

_version_	1783481598705926144
author	Ma, Tianle Zhang, Aidong
author_facet	Ma, Tianle Zhang, Aidong
author_sort	Ma, Tianle
collection	PubMed
description	BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.
format	Online Article Text
id	pubmed-6923820
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69238202019-12-30 Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) Ma, Tianle Zhang, Aidong BMC Genomics Research BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features. BioMed Central 2019-12-20 /pmc/articles/PMC6923820/ /pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Ma, Tianle Zhang, Aidong Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title	Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_full	Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_fullStr	Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_full_unstemmed	Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_short	Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
title_sort	integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae)
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/ https://www.ncbi.nlm.nih.gov/pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x
work_keys_str_mv	AT matianle integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae AT zhangaidong integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae

Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)

Ejemplares similares