Cargando…

RUBic: rapid unsupervised biclustering

Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sriwastava, Brijesh K., Halder, Anup Kumar, Basu, Subhadip, Chakraborti, Tapabrata
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10655409/ https://www.ncbi.nlm.nih.gov/pubmed/37974081 http://dx.doi.org/10.1186/s12859-023-05534-3

_version_	1785147941279760384
author	Sriwastava, Brijesh K. Halder, Anup Kumar Basu, Subhadip Chakraborti, Tapabrata
author_facet	Sriwastava, Brijesh K. Halder, Anup Kumar Basu, Subhadip Chakraborti, Tapabrata
author_sort	Sriwastava, Brijesh K.
collection	PubMed
description	Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took [Formula: see text] s to extract 494,872 biclusters. In the human PPI database of size [Formula: see text] , our method generates 1840 biclusters in [Formula: see text] s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at (https://github.com/CMATERJU-BIOINFO/RUBic) for academic use only.
format	Online Article Text
id	pubmed-10655409
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-106554092023-11-16 RUBic: rapid unsupervised biclustering Sriwastava, Brijesh K. Halder, Anup Kumar Basu, Subhadip Chakraborti, Tapabrata BMC Bioinformatics Research Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took [Formula: see text] s to extract 494,872 biclusters. In the human PPI database of size [Formula: see text] , our method generates 1840 biclusters in [Formula: see text] s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at (https://github.com/CMATERJU-BIOINFO/RUBic) for academic use only. BioMed Central 2023-11-16 /pmc/articles/PMC10655409/ /pubmed/37974081 http://dx.doi.org/10.1186/s12859-023-05534-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Sriwastava, Brijesh K. Halder, Anup Kumar Basu, Subhadip Chakraborti, Tapabrata RUBic: rapid unsupervised biclustering
title	RUBic: rapid unsupervised biclustering
title_full	RUBic: rapid unsupervised biclustering
title_fullStr	RUBic: rapid unsupervised biclustering
title_full_unstemmed	RUBic: rapid unsupervised biclustering
title_short	RUBic: rapid unsupervised biclustering
title_sort	rubic: rapid unsupervised biclustering
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10655409/ https://www.ncbi.nlm.nih.gov/pubmed/37974081 http://dx.doi.org/10.1186/s12859-023-05534-3
work_keys_str_mv	AT sriwastavabrijeshk rubicrapidunsupervisedbiclustering AT halderanupkumar rubicrapidunsupervisedbiclustering AT basusubhadip rubicrapidunsupervisedbiclustering AT chakrabortitapabrata rubicrapidunsupervisedbiclustering

RUBic: rapid unsupervised biclustering

Ejemplares similares