Cargando…
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link differ...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/ https://www.ncbi.nlm.nih.gov/pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x |
_version_ | 1783481598705926144 |
---|---|
author | Ma, Tianle Zhang, Aidong |
author_facet | Ma, Tianle Zhang, Aidong |
author_sort | Ma, Tianle |
collection | PubMed |
description | BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features. |
format | Online Article Text |
id | pubmed-6923820 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69238202019-12-30 Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) Ma, Tianle Zhang, Aidong BMC Genomics Research BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. RESULTS: We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. CONCLUSIONS: To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features. BioMed Central 2019-12-20 /pmc/articles/PMC6923820/ /pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ma, Tianle Zhang, Aidong Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title | Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title_full | Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title_fullStr | Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title_full_unstemmed | Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title_short | Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE) |
title_sort | integrate multi-omics data with biological interaction networks using multi-view factorization autoencoder (mae) |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923820/ https://www.ncbi.nlm.nih.gov/pubmed/31856727 http://dx.doi.org/10.1186/s12864-019-6285-x |
work_keys_str_mv | AT matianle integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae AT zhangaidong integratemultiomicsdatawithbiologicalinteractionnetworksusingmultiviewfactorizationautoencodermae |