Cargando…

Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach

In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLS...

Descripción completa

Detalles Bibliográficos
Autores principales: Grellmann, Claudia, Neumann, Jane, Bitzer, Sebastian, Kovacs, Peter, Tönjes, Anke, Westlye, Lars T., Andreassen, Ole A., Stumvoll, Michael, Villringer, Arno, Horstmann, Annette
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894907/
https://www.ncbi.nlm.nih.gov/pubmed/27375677
http://dx.doi.org/10.3389/fgene.2016.00102
_version_ 1782435741266083840
author Grellmann, Claudia
Neumann, Jane
Bitzer, Sebastian
Kovacs, Peter
Tönjes, Anke
Westlye, Lars T.
Andreassen, Ole A.
Stumvoll, Michael
Villringer, Arno
Horstmann, Annette
author_facet Grellmann, Claudia
Neumann, Jane
Bitzer, Sebastian
Kovacs, Peter
Tönjes, Anke
Westlye, Lars T.
Andreassen, Ole A.
Stumvoll, Michael
Villringer, Arno
Horstmann, Annette
author_sort Grellmann, Claudia
collection PubMed
description In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, in genetic neuroimaging. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into PLSC to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.
format Online
Article
Text
id pubmed-4894907
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-48949072016-07-01 Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach Grellmann, Claudia Neumann, Jane Bitzer, Sebastian Kovacs, Peter Tönjes, Anke Westlye, Lars T. Andreassen, Ole A. Stumvoll, Michael Villringer, Arno Horstmann, Annette Front Genet Genetics In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally expensive in applications involving large numbers of variables, as required, for example, in genetic neuroimaging. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into PLSC to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources. Frontiers Media S.A. 2016-06-07 /pmc/articles/PMC4894907/ /pubmed/27375677 http://dx.doi.org/10.3389/fgene.2016.00102 Text en Copyright © 2016 Grellmann, Neumann, Bitzer, Kovacs, Tönjes, Westlye, Andreassen, Stumvoll, Villringer and Horstmann. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Grellmann, Claudia
Neumann, Jane
Bitzer, Sebastian
Kovacs, Peter
Tönjes, Anke
Westlye, Lars T.
Andreassen, Ole A.
Stumvoll, Michael
Villringer, Arno
Horstmann, Annette
Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title_full Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title_fullStr Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title_full_unstemmed Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title_short Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
title_sort random projection for fast and efficient multivariate correlation analysis of high-dimensional data: a new approach
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4894907/
https://www.ncbi.nlm.nih.gov/pubmed/27375677
http://dx.doi.org/10.3389/fgene.2016.00102
work_keys_str_mv AT grellmannclaudia randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT neumannjane randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT bitzersebastian randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT kovacspeter randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT tonjesanke randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT westlyelarst randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT andreassenolea randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT stumvollmichael randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT villringerarno randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach
AT horstmannannette randomprojectionforfastandefficientmultivariatecorrelationanalysisofhighdimensionaldataanewapproach