Cargando…

Pairwise gene GO-based measures for biclustering of high-dimensional expression data

BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes func...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nepomuceno, Juan A., Troncoso, Alicia, Nepomuceno-Chamorro, Isabel A., Aguilar-Ruiz, Jesús S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872503/ https://www.ncbi.nlm.nih.gov/pubmed/29610579 http://dx.doi.org/10.1186/s13040-018-0165-9

_version_	1783309850413891584
author	Nepomuceno, Juan A. Troncoso, Alicia Nepomuceno-Chamorro, Isabel A. Aguilar-Ruiz, Jesús S.
author_facet	Nepomuceno, Juan A. Troncoso, Alicia Nepomuceno-Chamorro, Isabel A. Aguilar-Ruiz, Jesús S.
author_sort	Nepomuceno, Juan A.
collection	PubMed
description	BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. RESULTS: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. CONCLUSIONS: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view.
format	Online Article Text
id	pubmed-5872503
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-58725032018-04-02 Pairwise gene GO-based measures for biclustering of high-dimensional expression data Nepomuceno, Juan A. Troncoso, Alicia Nepomuceno-Chamorro, Isabel A. Aguilar-Ruiz, Jesús S. BioData Min Research BACKGROUND: Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance among genes can be defined according to their information stored in Gene Ontology (GO). Gene pairwise GO semantic similarity measures report a value for each pair of genes which establishes their functional similarity. A scatter search-based algorithm that optimizes a merit function that integrates GO information is studied in this paper. This merit function uses a term that addresses the information through a GO measure. RESULTS: The effect of two possible different gene pairwise GO measures on the performance of the algorithm is analyzed. Firstly, three well known yeast datasets with approximately one thousand of genes are studied. Secondly, a group of human datasets related to clinical data of cancer is also explored by the algorithm. Most of these data are high-dimensional datasets composed of a huge number of genes. The resultant biclusters reveal groups of genes linked by a same functionality when the search procedure is driven by one of the proposed GO measures. Furthermore, a qualitative biological study of a group of biclusters show their relevance from a cancer disease perspective. CONCLUSIONS: It can be concluded that the integration of biological information improves the performance of the biclustering process. The two different GO measures studied show an improvement in the results obtained for the yeast dataset. However, if datasets are composed of a huge number of genes, only one of them really improves the algorithm performance. This second case constitutes a clear option to explore interesting datasets from a clinical point of view. BioMed Central 2018-03-27 /pmc/articles/PMC5872503/ /pubmed/29610579 http://dx.doi.org/10.1186/s13040-018-0165-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Nepomuceno, Juan A. Troncoso, Alicia Nepomuceno-Chamorro, Isabel A. Aguilar-Ruiz, Jesús S. Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title	Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title_full	Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title_fullStr	Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title_full_unstemmed	Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title_short	Pairwise gene GO-based measures for biclustering of high-dimensional expression data
title_sort	pairwise gene go-based measures for biclustering of high-dimensional expression data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5872503/ https://www.ncbi.nlm.nih.gov/pubmed/29610579 http://dx.doi.org/10.1186/s13040-018-0165-9
work_keys_str_mv	AT nepomucenojuana pairwisegenegobasedmeasuresforbiclusteringofhighdimensionalexpressiondata AT troncosoalicia pairwisegenegobasedmeasuresforbiclusteringofhighdimensionalexpressiondata AT nepomucenochamorroisabela pairwisegenegobasedmeasuresforbiclusteringofhighdimensionalexpressiondata AT aguilarruizjesuss pairwisegenegobasedmeasuresforbiclusteringofhighdimensionalexpressiondata

Pairwise gene GO-based measures for biclustering of high-dimensional expression data

Ejemplares similares