Cargando…

Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis

DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Oghabian, Ali, Kilpinen, Sami, Hautaniemi, Sampsa, Czeizler, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3961251/
https://www.ncbi.nlm.nih.gov/pubmed/24651574
http://dx.doi.org/10.1371/journal.pone.0090801
_version_ 1782308261515493376
author Oghabian, Ali
Kilpinen, Sami
Hautaniemi, Sampsa
Czeizler, Elena
author_facet Oghabian, Ali
Kilpinen, Sami
Hautaniemi, Sampsa
Czeizler, Elena
author_sort Oghabian, Ali
collection PubMed
description DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering) methods where genes (or respectively samples) are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes). An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical) methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1) we examine how well the considered (bi)clustering methods differentiate various sample types; (2) we evaluate how well the groups of genes discovered by the (bi)clustering methods are annotated with similar Gene Ontology categories; (3) we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4) we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.
format Online
Article
Text
id pubmed-3961251
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39612512014-03-27 Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis Oghabian, Ali Kilpinen, Sami Hautaniemi, Sampsa Czeizler, Elena PLoS One Research Article DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering) methods where genes (or respectively samples) are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes). An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical) methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1) we examine how well the considered (bi)clustering methods differentiate various sample types; (2) we evaluate how well the groups of genes discovered by the (bi)clustering methods are annotated with similar Gene Ontology categories; (3) we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4) we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples. Public Library of Science 2014-03-20 /pmc/articles/PMC3961251/ /pubmed/24651574 http://dx.doi.org/10.1371/journal.pone.0090801 Text en © 2014 Oghabian et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Oghabian, Ali
Kilpinen, Sami
Hautaniemi, Sampsa
Czeizler, Elena
Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title_full Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title_fullStr Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title_full_unstemmed Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title_short Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
title_sort biclustering methods: biological relevance and application in gene expression analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3961251/
https://www.ncbi.nlm.nih.gov/pubmed/24651574
http://dx.doi.org/10.1371/journal.pone.0090801
work_keys_str_mv AT oghabianali biclusteringmethodsbiologicalrelevanceandapplicationingeneexpressionanalysis
AT kilpinensami biclusteringmethodsbiologicalrelevanceandapplicationingeneexpressionanalysis
AT hautaniemisampsa biclusteringmethodsbiologicalrelevanceandapplicationingeneexpressionanalysis
AT czeizlerelena biclusteringmethodsbiologicalrelevanceandapplicationingeneexpressionanalysis