Cargando…
Efficient Test and Visualization of Multi-Set Intersections
Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658477/ https://www.ncbi.nlm.nih.gov/pubmed/26603754 http://dx.doi.org/10.1038/srep16923 |
_version_ | 1782402520455315456 |
---|---|
author | Wang, Minghui Zhao, Yongzhong Zhang, Bin |
author_facet | Wang, Minghui Zhao, Yongzhong Zhang, Bin |
author_sort | Wang, Minghui |
collection | PubMed |
description | Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines. |
format | Online Article Text |
id | pubmed-4658477 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-46584772015-11-30 Efficient Test and Visualization of Multi-Set Intersections Wang, Minghui Zhao, Yongzhong Zhang, Bin Sci Rep Article Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines. Nature Publishing Group 2015-11-25 /pmc/articles/PMC4658477/ /pubmed/26603754 http://dx.doi.org/10.1038/srep16923 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Wang, Minghui Zhao, Yongzhong Zhang, Bin Efficient Test and Visualization of Multi-Set Intersections |
title | Efficient Test and Visualization of Multi-Set Intersections |
title_full | Efficient Test and Visualization of Multi-Set Intersections |
title_fullStr | Efficient Test and Visualization of Multi-Set Intersections |
title_full_unstemmed | Efficient Test and Visualization of Multi-Set Intersections |
title_short | Efficient Test and Visualization of Multi-Set Intersections |
title_sort | efficient test and visualization of multi-set intersections |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658477/ https://www.ncbi.nlm.nih.gov/pubmed/26603754 http://dx.doi.org/10.1038/srep16923 |
work_keys_str_mv | AT wangminghui efficienttestandvisualizationofmultisetintersections AT zhaoyongzhong efficienttestandvisualizationofmultisetintersections AT zhangbin efficienttestandvisualizationofmultisetintersections |