Cargando…
Discovering and deciphering relationships across disparate data modalities
Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and ofte...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
eLife Sciences Publications, Ltd
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386524/ https://www.ncbi.nlm.nih.gov/pubmed/30644820 http://dx.doi.org/10.7554/eLife.41690 |
_version_ | 1783397397073756160 |
---|---|
author | Vogelstein, Joshua T Bridgeford, Eric W Wang, Qing Priebe, Carey E Maggioni, Mauro Shen, Cencheng |
author_facet | Vogelstein, Joshua T Bridgeford, Eric W Wang, Qing Priebe, Carey E Maggioni, Mauro Shen, Cencheng |
author_sort | Vogelstein, Joshua T |
collection | PubMed |
description | Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct. |
format | Online Article Text |
id | pubmed-6386524 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | eLife Sciences Publications, Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-63865242019-02-25 Discovering and deciphering relationships across disparate data modalities Vogelstein, Joshua T Bridgeford, Eric W Wang, Qing Priebe, Carey E Maggioni, Mauro Shen, Cencheng eLife Computational and Systems Biology Understanding the relationships between different properties of data, such as whether a genome or connectome has information about disease status, is increasingly important. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes and often are not interpretable. Our approach, ‘Multiscale Graph Correlation’ (MGC), is a dependence test that juxtaposes disparate data science techniques, including k-nearest neighbors, kernel methods, and multiscale analysis. Other methods may require double or triple the number of samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely characterizes the latent geometry underlying the relationship, while maintaining computational efficiency. In real data, including brain imaging and cancer genetics, MGC detects the presence of a dependency and provides guidance for the next experiments to conduct. eLife Sciences Publications, Ltd 2019-01-15 /pmc/articles/PMC6386524/ /pubmed/30644820 http://dx.doi.org/10.7554/eLife.41690 Text en © 2019, Vogelstein et al http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited. |
spellingShingle | Computational and Systems Biology Vogelstein, Joshua T Bridgeford, Eric W Wang, Qing Priebe, Carey E Maggioni, Mauro Shen, Cencheng Discovering and deciphering relationships across disparate data modalities |
title | Discovering and deciphering relationships across disparate data modalities |
title_full | Discovering and deciphering relationships across disparate data modalities |
title_fullStr | Discovering and deciphering relationships across disparate data modalities |
title_full_unstemmed | Discovering and deciphering relationships across disparate data modalities |
title_short | Discovering and deciphering relationships across disparate data modalities |
title_sort | discovering and deciphering relationships across disparate data modalities |
topic | Computational and Systems Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6386524/ https://www.ncbi.nlm.nih.gov/pubmed/30644820 http://dx.doi.org/10.7554/eLife.41690 |
work_keys_str_mv | AT vogelsteinjoshuat discoveringanddecipheringrelationshipsacrossdisparatedatamodalities AT bridgefordericw discoveringanddecipheringrelationshipsacrossdisparatedatamodalities AT wangqing discoveringanddecipheringrelationshipsacrossdisparatedatamodalities AT priebecareye discoveringanddecipheringrelationshipsacrossdisparatedatamodalities AT maggionimauro discoveringanddecipheringrelationshipsacrossdisparatedatamodalities AT shencencheng discoveringanddecipheringrelationshipsacrossdisparatedatamodalities |