Cargando…
Using set theory to reduce redundancy in pathway sets
BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194563/ https://www.ncbi.nlm.nih.gov/pubmed/30340461 http://dx.doi.org/10.1186/s12859-018-2355-3 |
_version_ | 1783364248295964672 |
---|---|
author | Stoney, Ruth Alexandra Schwartz, Jean-Marc Robertson, David L Nenadic, Goran |
author_facet | Stoney, Ruth Alexandra Schwartz, Jean-Marc Robertson, David L Nenadic, Goran |
author_sort | Stoney, Ruth Alexandra |
collection | PubMed |
description | BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. RESULTS: We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. CONCLUSION: Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2355-3) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6194563 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61945632018-10-25 Using set theory to reduce redundancy in pathway sets Stoney, Ruth Alexandra Schwartz, Jean-Marc Robertson, David L Nenadic, Goran BMC Bioinformatics Research Article BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. RESULTS: We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. CONCLUSION: Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2355-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-19 /pmc/articles/PMC6194563/ /pubmed/30340461 http://dx.doi.org/10.1186/s12859-018-2355-3 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Stoney, Ruth Alexandra Schwartz, Jean-Marc Robertson, David L Nenadic, Goran Using set theory to reduce redundancy in pathway sets |
title | Using set theory to reduce redundancy in pathway sets |
title_full | Using set theory to reduce redundancy in pathway sets |
title_fullStr | Using set theory to reduce redundancy in pathway sets |
title_full_unstemmed | Using set theory to reduce redundancy in pathway sets |
title_short | Using set theory to reduce redundancy in pathway sets |
title_sort | using set theory to reduce redundancy in pathway sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194563/ https://www.ncbi.nlm.nih.gov/pubmed/30340461 http://dx.doi.org/10.1186/s12859-018-2355-3 |
work_keys_str_mv | AT stoneyruthalexandra usingsettheorytoreduceredundancyinpathwaysets AT schwartzjeanmarc usingsettheorytoreduceredundancyinpathwaysets AT robertsondavidl usingsettheorytoreduceredundancyinpathwaysets AT nenadicgoran usingsettheorytoreduceredundancyinpathwaysets |