Cargando…

ARBic: an all-round biclustering algorithm for analyzing gene expression data

Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, none of existing algorithms can simultaneously identify both broader and narrower biclusters due to their failure of balanc...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Xiangyu, Yu, Ting, Zhao, Xiaoyu, Long, Chaoyi, Han, Renmin, Su, Zhengchang, Li, Guojun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887595/
https://www.ncbi.nlm.nih.gov/pubmed/36733402
http://dx.doi.org/10.1093/nargab/lqad009
_version_ 1784880373779398656
author Liu, Xiangyu
Yu, Ting
Zhao, Xiaoyu
Long, Chaoyi
Han, Renmin
Su, Zhengchang
Li, Guojun
author_facet Liu, Xiangyu
Yu, Ting
Zhao, Xiaoyu
Long, Chaoyi
Han, Renmin
Su, Zhengchang
Li, Guojun
author_sort Liu, Xiangyu
collection PubMed
description Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, none of existing algorithms can simultaneously identify both broader and narrower biclusters due to their failure of balancing between effectiveness and efficiency. We introduced ARBic, an algorithm which is capable of accurately identifying any significant biclusters of any shape, including broader, narrower and square, in any large scale gene expression dataset. ARBic was designed by integrating column-based and row-based strategies into a single biclustering procedure. The column-based strategy borrowed from RecBic, a recently published biclustering tool, extracts narrower biclusters, while the row-based strategy that iteratively finds the longest path in a specific directed graph, extracts broader ones. Being tested and compared to other seven salient biclustering algorithms on simulated datasets, ARBic achieves at least an average of 29% higher recovery, relevance and [Formula: see text] scores than the best existing tool. In addition, ARBic substantially outperforms all tools on real datasets and is more robust to noises, bicluster shapes and dataset types.
format Online
Article
Text
id pubmed-9887595
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98875952023-02-01 ARBic: an all-round biclustering algorithm for analyzing gene expression data Liu, Xiangyu Yu, Ting Zhao, Xiaoyu Long, Chaoyi Han, Renmin Su, Zhengchang Li, Guojun NAR Genom Bioinform Methods Article Identifying significant biclusters of genes with specific expression patterns is an effective approach to reveal functionally correlated genes in gene expression data. However, none of existing algorithms can simultaneously identify both broader and narrower biclusters due to their failure of balancing between effectiveness and efficiency. We introduced ARBic, an algorithm which is capable of accurately identifying any significant biclusters of any shape, including broader, narrower and square, in any large scale gene expression dataset. ARBic was designed by integrating column-based and row-based strategies into a single biclustering procedure. The column-based strategy borrowed from RecBic, a recently published biclustering tool, extracts narrower biclusters, while the row-based strategy that iteratively finds the longest path in a specific directed graph, extracts broader ones. Being tested and compared to other seven salient biclustering algorithms on simulated datasets, ARBic achieves at least an average of 29% higher recovery, relevance and [Formula: see text] scores than the best existing tool. In addition, ARBic substantially outperforms all tools on real datasets and is more robust to noises, bicluster shapes and dataset types. Oxford University Press 2023-01-31 /pmc/articles/PMC9887595/ /pubmed/36733402 http://dx.doi.org/10.1093/nargab/lqad009 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Liu, Xiangyu
Yu, Ting
Zhao, Xiaoyu
Long, Chaoyi
Han, Renmin
Su, Zhengchang
Li, Guojun
ARBic: an all-round biclustering algorithm for analyzing gene expression data
title ARBic: an all-round biclustering algorithm for analyzing gene expression data
title_full ARBic: an all-round biclustering algorithm for analyzing gene expression data
title_fullStr ARBic: an all-round biclustering algorithm for analyzing gene expression data
title_full_unstemmed ARBic: an all-round biclustering algorithm for analyzing gene expression data
title_short ARBic: an all-round biclustering algorithm for analyzing gene expression data
title_sort arbic: an all-round biclustering algorithm for analyzing gene expression data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9887595/
https://www.ncbi.nlm.nih.gov/pubmed/36733402
http://dx.doi.org/10.1093/nargab/lqad009
work_keys_str_mv AT liuxiangyu arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT yuting arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT zhaoxiaoyu arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT longchaoyi arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT hanrenmin arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT suzhengchang arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata
AT liguojun arbicanallroundbiclusteringalgorithmforanalyzinggeneexpressiondata