Cargando…
High-sensitivity pattern discovery in large, paired multiomic datasets
MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to iden...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235493/ https://www.ncbi.nlm.nih.gov/pubmed/35758795 http://dx.doi.org/10.1093/bioinformatics/btac232 |
_version_ | 1784736323207168000 |
---|---|
author | Ghazi, Andrew R Sucipto, Kathleen Rahnavard, Ali Franzosa, Eric A McIver, Lauren J Lloyd-Price, Jason Schwager, Emma Weingart, George Moon, Yo Sup Morgan, Xochitl C Waldron, Levi Huttenhower, Curtis |
author_facet | Ghazi, Andrew R Sucipto, Kathleen Rahnavard, Ali Franzosa, Eric A McIver, Lauren J Lloyd-Price, Jason Schwager, Emma Weingart, George Moon, Yo Sup Morgan, Xochitl C Waldron, Levi Huttenhower, Curtis |
author_sort | Ghazi, Andrew R |
collection | PubMed |
description | MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. RESULTS: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9235493 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92354932022-06-29 High-sensitivity pattern discovery in large, paired multiomic datasets Ghazi, Andrew R Sucipto, Kathleen Rahnavard, Ali Franzosa, Eric A McIver, Lauren J Lloyd-Price, Jason Schwager, Emma Weingart, George Moon, Yo Sup Morgan, Xochitl C Waldron, Levi Huttenhower, Curtis Bioinformatics ISCB/Ismb 2022 MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control. RESULTS: Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235493/ /pubmed/35758795 http://dx.doi.org/10.1093/bioinformatics/btac232 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | ISCB/Ismb 2022 Ghazi, Andrew R Sucipto, Kathleen Rahnavard, Ali Franzosa, Eric A McIver, Lauren J Lloyd-Price, Jason Schwager, Emma Weingart, George Moon, Yo Sup Morgan, Xochitl C Waldron, Levi Huttenhower, Curtis High-sensitivity pattern discovery in large, paired multiomic datasets |
title | High-sensitivity pattern discovery in large, paired multiomic datasets |
title_full | High-sensitivity pattern discovery in large, paired multiomic datasets |
title_fullStr | High-sensitivity pattern discovery in large, paired multiomic datasets |
title_full_unstemmed | High-sensitivity pattern discovery in large, paired multiomic datasets |
title_short | High-sensitivity pattern discovery in large, paired multiomic datasets |
title_sort | high-sensitivity pattern discovery in large, paired multiomic datasets |
topic | ISCB/Ismb 2022 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235493/ https://www.ncbi.nlm.nih.gov/pubmed/35758795 http://dx.doi.org/10.1093/bioinformatics/btac232 |
work_keys_str_mv | AT ghaziandrewr highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT suciptokathleen highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT rahnavardali highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT franzosaerica highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT mciverlaurenj highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT lloydpricejason highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT schwageremma highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT weingartgeorge highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT moonyosup highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT morganxochitlc highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT waldronlevi highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets AT huttenhowercurtis highsensitivitypatterndiscoveryinlargepairedmultiomicdatasets |