Cargando…

Efficient Test and Visualization of Multi-Set Intersections

Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Minghui, Zhao, Yongzhong, Zhang, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658477/
https://www.ncbi.nlm.nih.gov/pubmed/26603754
http://dx.doi.org/10.1038/srep16923
_version_ 1782402520455315456
author Wang, Minghui
Zhao, Yongzhong
Zhang, Bin
author_facet Wang, Minghui
Zhao, Yongzhong
Zhang, Bin
author_sort Wang, Minghui
collection PubMed
description Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines.
format Online
Article
Text
id pubmed-4658477
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46584772015-11-30 Efficient Test and Visualization of Multi-Set Intersections Wang, Minghui Zhao, Yongzhong Zhang, Bin Sci Rep Article Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines. Nature Publishing Group 2015-11-25 /pmc/articles/PMC4658477/ /pubmed/26603754 http://dx.doi.org/10.1038/srep16923 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Wang, Minghui
Zhao, Yongzhong
Zhang, Bin
Efficient Test and Visualization of Multi-Set Intersections
title Efficient Test and Visualization of Multi-Set Intersections
title_full Efficient Test and Visualization of Multi-Set Intersections
title_fullStr Efficient Test and Visualization of Multi-Set Intersections
title_full_unstemmed Efficient Test and Visualization of Multi-Set Intersections
title_short Efficient Test and Visualization of Multi-Set Intersections
title_sort efficient test and visualization of multi-set intersections
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658477/
https://www.ncbi.nlm.nih.gov/pubmed/26603754
http://dx.doi.org/10.1038/srep16923
work_keys_str_mv AT wangminghui efficienttestandvisualizationofmultisetintersections
AT zhaoyongzhong efficienttestandvisualizationofmultisetintersections
AT zhangbin efficienttestandvisualizationofmultisetintersections