Cargando…

GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions

BACKGROUND: Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Guannan, Oh, Dong-Ha, Dassanayake, Maheshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7146957/
https://www.ncbi.nlm.nih.gov/pubmed/32272889
http://dx.doi.org/10.1186/s12859-020-3447-4
_version_ 1783520321793425408
author Wang, Guannan
Oh, Dong-Ha
Dassanayake, Maheshi
author_facet Wang, Guannan
Oh, Dong-Ha
Dassanayake, Maheshi
author_sort Wang, Guannan
collection PubMed
description BACKGROUND: Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of the gene set. However, these lists of overrepresented GO terms are often too large and contains redundant overlapping GO terms hindering informative functional interpretations. RESULTS: We developed GOMCL to reduce redundancy and summarize lists of GO terms effectively and informatively. This lightweight python toolkit efficiently identifies clusters within a list of GO terms using the Markov Clustering (MCL) algorithm, based on the overlap of gene members between GO terms. GOMCL facilitates biological interpretation of a large number of GO terms by condensing them into GO clusters representing non-overlapping functional themes. It enables visualizing GO clusters as a heatmap, networks based on either overlap of members or hierarchy among GO terms, and tables with depth and cluster information for each GO term. Each GO cluster generated by GOMCL can be evaluated and further divided into non-overlapping sub-clusters using the GOMCL-sub module. The outputs from both GOMCL and GOMCL-sub can be imported to Cytoscape for additional visualization effects. CONCLUSIONS: GOMCL is a convenient toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. GOMCL helps researchers to reduce time spent on manual curation of large lists of GO terms, minimize biases introduced by redundant GO terms in data interpretation, and batch processing of multiple GO enrichment datasets. A user guide, a test dataset, and the source code of GOMCL are available at https://github.com/Guannan-Wang/GOMCL and www.lsugenomics.org.
format Online
Article
Text
id pubmed-7146957
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71469572020-04-18 GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions Wang, Guannan Oh, Dong-Ha Dassanayake, Maheshi BMC Bioinformatics Software BACKGROUND: Functional enrichment of genes and pathways based on Gene Ontology (GO) has been widely used to describe the results of various -omics analyses. GO terms statistically overrepresented within a set of a large number of genes are typically used to describe the main functional attributes of the gene set. However, these lists of overrepresented GO terms are often too large and contains redundant overlapping GO terms hindering informative functional interpretations. RESULTS: We developed GOMCL to reduce redundancy and summarize lists of GO terms effectively and informatively. This lightweight python toolkit efficiently identifies clusters within a list of GO terms using the Markov Clustering (MCL) algorithm, based on the overlap of gene members between GO terms. GOMCL facilitates biological interpretation of a large number of GO terms by condensing them into GO clusters representing non-overlapping functional themes. It enables visualizing GO clusters as a heatmap, networks based on either overlap of members or hierarchy among GO terms, and tables with depth and cluster information for each GO term. Each GO cluster generated by GOMCL can be evaluated and further divided into non-overlapping sub-clusters using the GOMCL-sub module. The outputs from both GOMCL and GOMCL-sub can be imported to Cytoscape for additional visualization effects. CONCLUSIONS: GOMCL is a convenient toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions. GOMCL helps researchers to reduce time spent on manual curation of large lists of GO terms, minimize biases introduced by redundant GO terms in data interpretation, and batch processing of multiple GO enrichment datasets. A user guide, a test dataset, and the source code of GOMCL are available at https://github.com/Guannan-Wang/GOMCL and www.lsugenomics.org. BioMed Central 2020-04-10 /pmc/articles/PMC7146957/ /pubmed/32272889 http://dx.doi.org/10.1186/s12859-020-3447-4 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Wang, Guannan
Oh, Dong-Ha
Dassanayake, Maheshi
GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title_full GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title_fullStr GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title_full_unstemmed GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title_short GOMCL: a toolkit to cluster, evaluate, and extract non-redundant associations of Gene Ontology-based functions
title_sort gomcl: a toolkit to cluster, evaluate, and extract non-redundant associations of gene ontology-based functions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7146957/
https://www.ncbi.nlm.nih.gov/pubmed/32272889
http://dx.doi.org/10.1186/s12859-020-3447-4
work_keys_str_mv AT wangguannan gomclatoolkittoclusterevaluateandextractnonredundantassociationsofgeneontologybasedfunctions
AT ohdongha gomclatoolkittoclusterevaluateandextractnonredundantassociationsofgeneontologybasedfunctions
AT dassanayakemaheshi gomclatoolkittoclusterevaluateandextractnonredundantassociationsofgeneontologybasedfunctions