Cargando…

Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer

BACKGROUND: Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Okimoto, Gordon, Zeinalzadeh, Ashkan, Wenska, Tom, Loomis, Michael, Nation, James B., Fabre, Tiphaine, Tiirikainen, Maarit, Hernandez, Brenda, Chan, Owen, Wong, Linda, Kwee, Sandi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4966782/ https://www.ncbi.nlm.nih.gov/pubmed/27478503 http://dx.doi.org/10.1186/s13040-016-0103-7

_version_	1782445433565478912
author	Okimoto, Gordon Zeinalzadeh, Ashkan Wenska, Tom Loomis, Michael Nation, James B. Fabre, Tiphaine Tiirikainen, Maarit Hernandez, Brenda Chan, Owen Wong, Linda Kwee, Sandi
author_facet	Okimoto, Gordon Zeinalzadeh, Ashkan Wenska, Tom Loomis, Michael Nation, James B. Fabre, Tiphaine Tiirikainen, Maarit Hernandez, Brenda Chan, Owen Wong, Linda Kwee, Sandi
author_sort	Okimoto, Gordon
collection	PubMed
description	BACKGROUND: Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. METHODS: The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of “sparse” left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single “sparsity” parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on “residual” data matrices that result from a given sparse approximation. RESULTS: We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. CONCLUSIONS: Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0103-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4966782
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49667822016-07-30 Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer Okimoto, Gordon Zeinalzadeh, Ashkan Wenska, Tom Loomis, Michael Nation, James B. Fabre, Tiphaine Tiirikainen, Maarit Hernandez, Brenda Chan, Owen Wong, Linda Kwee, Sandi BioData Min Methodology BACKGROUND: Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. METHODS: The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of “sparse” left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single “sparsity” parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on “residual” data matrices that result from a given sparse approximation. RESULTS: We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. CONCLUSIONS: Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-016-0103-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-07-29 /pmc/articles/PMC4966782/ /pubmed/27478503 http://dx.doi.org/10.1186/s13040-016-0103-7 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Okimoto, Gordon Zeinalzadeh, Ashkan Wenska, Tom Loomis, Michael Nation, James B. Fabre, Tiphaine Tiirikainen, Maarit Hernandez, Brenda Chan, Owen Wong, Linda Kwee, Sandi Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title	Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title_full	Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title_fullStr	Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title_full_unstemmed	Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title_short	Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
title_sort	joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4966782/ https://www.ncbi.nlm.nih.gov/pubmed/27478503 http://dx.doi.org/10.1186/s13040-016-0103-7
work_keys_str_mv	AT okimotogordon jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT zeinalzadehashkan jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT wenskatom jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT loomismichael jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT nationjamesb jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT fabretiphaine jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT tiirikainenmaarit jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT hernandezbrenda jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT chanowen jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT wonglinda jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer AT kweesandi jointanalysisofmultiplehighdimensionaldatatypesusingsparsematrixapproximationsofrank1withapplicationstoovarianandlivercancer

Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer

Ejemplares similares