Cargando…

Utility and Limitations of Using Gene Expression Data to Identify Functional Associations

Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about...

Descripción completa

Detalles Bibliográficos
Autores principales: Uygun, Sahra, Peng, Cheng, Lehti-Shiu, Melissa D., Last, Robert L., Shiu, Shin-Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147789/
https://www.ncbi.nlm.nih.gov/pubmed/27935950
http://dx.doi.org/10.1371/journal.pcbi.1005244
_version_ 1782473730688024576
author Uygun, Sahra
Peng, Cheng
Lehti-Shiu, Melissa D.
Last, Robert L.
Shiu, Shin-Han
author_facet Uygun, Sahra
Peng, Cheng
Lehti-Shiu, Melissa D.
Last, Robert L.
Shiu, Shin-Han
author_sort Uygun, Sahra
collection PubMed
description Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets.
format Online
Article
Text
id pubmed-5147789
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51477892016-12-28 Utility and Limitations of Using Gene Expression Data to Identify Functional Associations Uygun, Sahra Peng, Cheng Lehti-Shiu, Melissa D. Last, Robert L. Shiu, Shin-Han PLoS Comput Biol Research Article Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets. Public Library of Science 2016-12-09 /pmc/articles/PMC5147789/ /pubmed/27935950 http://dx.doi.org/10.1371/journal.pcbi.1005244 Text en © 2016 Uygun et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Uygun, Sahra
Peng, Cheng
Lehti-Shiu, Melissa D.
Last, Robert L.
Shiu, Shin-Han
Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title_full Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title_fullStr Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title_full_unstemmed Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title_short Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
title_sort utility and limitations of using gene expression data to identify functional associations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147789/
https://www.ncbi.nlm.nih.gov/pubmed/27935950
http://dx.doi.org/10.1371/journal.pcbi.1005244
work_keys_str_mv AT uygunsahra utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations
AT pengcheng utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations
AT lehtishiumelissad utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations
AT lastrobertl utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations
AT shiushinhan utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations