Cargando…

Identifying gene-specific subgroups: an alternative to biclustering

BACKGROUND: Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified....

Descripción completa

Detalles Bibliográficos
Autores principales:	Branders, Vincent, Schaus, Pierre, Dupont, Pierre
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888937/ https://www.ncbi.nlm.nih.gov/pubmed/31795929 http://dx.doi.org/10.1186/s12859-019-3289-0

_version_	1783475329853030400
author	Branders, Vincent Schaus, Pierre Dupont, Pierre
author_facet	Branders, Vincent Schaus, Pierre Dupont, Pierre
author_sort	Branders, Vincent
collection	PubMed
description	BACKGROUND: Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified. The result is a set of biclusters, each forming a specific submatrix of rows (e.g. genes) and columns (e.g. samples). Relevant biclusters can, however, be missed when, due to the presence of a few outliers, they lack the assumed homogeneity of expression values among a few gene/sample combinations. The Max-Sum SubMatrix problem addresses this issue by looking at highly expressed subsets of genes and of samples, without enforcing such homogeneity. RESULTS: We present here the K-CPGC algorithm to identify K relevant submatrices. Our main contribution is to show that this approach outperforms biclustering algorithms to identify several gene subsets representative of specific subgroups of samples. Experiments are conducted on 35 gene expression datasets from human tissues and yeast samples. We report comparative results with those obtained by several biclustering algorithms, including CCA, xMOTIFs, ISA, QUBIC, Plaid and Spectral. Gene enrichment analysis demonstrates the benefits of the proposed approach to identify more statistically significant gene subsets. The most significant Gene Ontology terms identified with K-CPGC are shown consistent with the controlled conditions of each dataset. This analysis supports the biological relevance of the identified gene subsets. An additional contribution is the statistical validation protocol proposed here to assess the relative performances of biclustering algorithms and of the proposed method. It relies on a Friedman test and the Hochberg’s sequential procedure to report critical differences of ranks among all algorithms. CONCLUSIONS: We propose here the K-CPGC method, a computationally efficient algorithm to identify K max-sum submatrices in a large gene expression matrix. Comparisons show that it identifies more significantly enriched subsets of genes and specific subgroups of samples which are easily interpretable by biologists. Experiments also show its ability to identify more reliable GO terms. These results illustrate the benefits of the proposed approach in terms of interpretability and of biological enrichment quality. Open implementation of this algorithm is available as an R package.
format	Online Article Text
id	pubmed-6888937
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-68889372019-12-11 Identifying gene-specific subgroups: an alternative to biclustering Branders, Vincent Schaus, Pierre Dupont, Pierre BMC Bioinformatics Research Article BACKGROUND: Transcriptome analysis aims at gaining insight into cellular processes through discovering gene expression patterns across various experimental conditions. Biclustering is a standard approach to discover genes subsets with similar expression across subgroups of samples to be identified. The result is a set of biclusters, each forming a specific submatrix of rows (e.g. genes) and columns (e.g. samples). Relevant biclusters can, however, be missed when, due to the presence of a few outliers, they lack the assumed homogeneity of expression values among a few gene/sample combinations. The Max-Sum SubMatrix problem addresses this issue by looking at highly expressed subsets of genes and of samples, without enforcing such homogeneity. RESULTS: We present here the K-CPGC algorithm to identify K relevant submatrices. Our main contribution is to show that this approach outperforms biclustering algorithms to identify several gene subsets representative of specific subgroups of samples. Experiments are conducted on 35 gene expression datasets from human tissues and yeast samples. We report comparative results with those obtained by several biclustering algorithms, including CCA, xMOTIFs, ISA, QUBIC, Plaid and Spectral. Gene enrichment analysis demonstrates the benefits of the proposed approach to identify more statistically significant gene subsets. The most significant Gene Ontology terms identified with K-CPGC are shown consistent with the controlled conditions of each dataset. This analysis supports the biological relevance of the identified gene subsets. An additional contribution is the statistical validation protocol proposed here to assess the relative performances of biclustering algorithms and of the proposed method. It relies on a Friedman test and the Hochberg’s sequential procedure to report critical differences of ranks among all algorithms. CONCLUSIONS: We propose here the K-CPGC method, a computationally efficient algorithm to identify K max-sum submatrices in a large gene expression matrix. Comparisons show that it identifies more significantly enriched subsets of genes and specific subgroups of samples which are easily interpretable by biologists. Experiments also show its ability to identify more reliable GO terms. These results illustrate the benefits of the proposed approach in terms of interpretability and of biological enrichment quality. Open implementation of this algorithm is available as an R package. BioMed Central 2019-12-03 /pmc/articles/PMC6888937/ /pubmed/31795929 http://dx.doi.org/10.1186/s12859-019-3289-0 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Branders, Vincent Schaus, Pierre Dupont, Pierre Identifying gene-specific subgroups: an alternative to biclustering
title	Identifying gene-specific subgroups: an alternative to biclustering
title_full	Identifying gene-specific subgroups: an alternative to biclustering
title_fullStr	Identifying gene-specific subgroups: an alternative to biclustering
title_full_unstemmed	Identifying gene-specific subgroups: an alternative to biclustering
title_short	Identifying gene-specific subgroups: an alternative to biclustering
title_sort	identifying gene-specific subgroups: an alternative to biclustering
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6888937/ https://www.ncbi.nlm.nih.gov/pubmed/31795929 http://dx.doi.org/10.1186/s12859-019-3289-0
work_keys_str_mv	AT brandersvincent identifyinggenespecificsubgroupsanalternativetobiclustering AT schauspierre identifyinggenespecificsubgroupsanalternativetobiclustering AT dupontpierre identifyinggenespecificsubgroupsanalternativetobiclustering

Identifying gene-specific subgroups: an alternative to biclustering

Ejemplares similares