Cargando…

Discovering and deciphering relationships across disparate data modalities

Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and ofte...

Descripción completa

Detalles Bibliográficos
Autores principales: Vogelstein, Joshua T, Bridgeford, Eric W, Wang, Qing, Priebe, Carey E, Maggioni, Mauro, Shen, Cencheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386524/
https://www.ncbi.nlm.nih.gov/pubmed/30644820
http://dx.doi.org/10.7554/eLife.41690
_version_ 1783397397073756160
author Vogelstein, Joshua T
Bridgeford, Eric W
Wang, Qing
Priebe, Carey E
Maggioni, Mauro
Shen, Cencheng
author_facet Vogelstein, Joshua T
Bridgeford, Eric W
Wang, Qing
Priebe, Carey E
Maggioni, Mauro
Shen, Cencheng
author_sort Vogelstein, Joshua T
collection PubMed
description Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct.
format Online
Article
Text
id pubmed-6386524
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-63865242019-02-25 Discovering and deciphering relationships across disparate data modalities Vogelstein, Joshua T Bridgeford, Eric W Wang, Qing Priebe, Carey E Maggioni, Mauro Shen, Cencheng eLife Computational and Systems Biology Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct. eLife Sciences Publications, Ltd 2019-01-15 /pmc/articles/PMC6386524/ /pubmed/30644820 http://dx.doi.org/10.7554/eLife.41690 Text en © 2019, Vogelstein et al http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Computational and Systems Biology
Vogelstein, Joshua T
Bridgeford, Eric W
Wang, Qing
Priebe, Carey E
Maggioni, Mauro
Shen, Cencheng
Discovering and deciphering relationships across disparate data modalities
title Discovering and deciphering relationships across disparate data modalities
title_full Discovering and deciphering relationships across disparate data modalities
title_fullStr Discovering and deciphering relationships across disparate data modalities
title_full_unstemmed Discovering and deciphering relationships across disparate data modalities
title_short Discovering and deciphering relationships across disparate data modalities
title_sort discovering and deciphering relationships across disparate data modalities
topic Computational and Systems Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386524/
https://www.ncbi.nlm.nih.gov/pubmed/30644820
http://dx.doi.org/10.7554/eLife.41690
work_keys_str_mv AT vogelsteinjoshuat discoveringanddecipheringrelationshipsacrossdisparatedatamodalities
AT bridgefordericw discoveringanddecipheringrelationshipsacrossdisparatedatamodalities
AT wangqing discoveringanddecipheringrelationshipsacrossdisparatedatamodalities
AT priebecareye discoveringanddecipheringrelationshipsacrossdisparatedatamodalities
AT maggionimauro discoveringanddecipheringrelationshipsacrossdisparatedatamodalities
AT shencencheng discoveringanddecipheringrelationshipsacrossdisparatedatamodalities