Cargando…

Using set theory to reduce redundancy in pathway sets

BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway...

Descripción completa

Detalles Bibliográficos
Autores principales: Stoney, Ruth Alexandra, Schwartz, Jean-Marc, Robertson, David L, Nenadic, Goran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194563/
https://www.ncbi.nlm.nih.gov/pubmed/30340461
http://dx.doi.org/10.1186/s12859-018-2355-3
_version_ 1783364248295964672
author Stoney, Ruth Alexandra
Schwartz, Jean-Marc
Robertson, David L
Nenadic, Goran
author_facet Stoney, Ruth Alexandra
Schwartz, Jean-Marc
Robertson, David L
Nenadic, Goran
author_sort Stoney, Ruth Alexandra
collection PubMed
description BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. RESULTS: We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. CONCLUSION: Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2355-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6194563
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61945632018-10-25 Using set theory to reduce redundancy in pathway sets Stoney, Ruth Alexandra Schwartz, Jean-Marc Robertson, David L Nenadic, Goran BMC Bioinformatics Research Article BACKGROUND: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. RESULTS: We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. CONCLUSION: Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2355-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-19 /pmc/articles/PMC6194563/ /pubmed/30340461 http://dx.doi.org/10.1186/s12859-018-2355-3 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Stoney, Ruth Alexandra
Schwartz, Jean-Marc
Robertson, David L
Nenadic, Goran
Using set theory to reduce redundancy in pathway sets
title Using set theory to reduce redundancy in pathway sets
title_full Using set theory to reduce redundancy in pathway sets
title_fullStr Using set theory to reduce redundancy in pathway sets
title_full_unstemmed Using set theory to reduce redundancy in pathway sets
title_short Using set theory to reduce redundancy in pathway sets
title_sort using set theory to reduce redundancy in pathway sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6194563/
https://www.ncbi.nlm.nih.gov/pubmed/30340461
http://dx.doi.org/10.1186/s12859-018-2355-3
work_keys_str_mv AT stoneyruthalexandra usingsettheorytoreduceredundancyinpathwaysets
AT schwartzjeanmarc usingsettheorytoreduceredundancyinpathwaysets
AT robertsondavidl usingsettheorytoreduceredundancyinpathwaysets
AT nenadicgoran usingsettheorytoreduceredundancyinpathwaysets