Cargando…
Utility and Limitations of Using Gene Expression Data to Identify Functional Associations
Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147789/ https://www.ncbi.nlm.nih.gov/pubmed/27935950 http://dx.doi.org/10.1371/journal.pcbi.1005244 |
_version_ | 1782473730688024576 |
---|---|
author | Uygun, Sahra Peng, Cheng Lehti-Shiu, Melissa D. Last, Robert L. Shiu, Shin-Han |
author_facet | Uygun, Sahra Peng, Cheng Lehti-Shiu, Melissa D. Last, Robert L. Shiu, Shin-Han |
author_sort | Uygun, Sahra |
collection | PubMed |
description | Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets. |
format | Online Article Text |
id | pubmed-5147789 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-51477892016-12-28 Utility and Limitations of Using Gene Expression Data to Identify Functional Associations Uygun, Sahra Peng, Cheng Lehti-Shiu, Melissa D. Last, Robert L. Shiu, Shin-Han PLoS Comput Biol Research Article Gene co-expression has been widely used to hypothesize gene function through guilt-by association. However, it is not clear to what degree co-expression is informative, whether it can be applied to genes involved in different biological processes, and how the type of dataset impacts inferences about gene functions. Here our goal is to assess the utility and limitations of using co-expression as a criterion to recover functional associations between genes. By determining the percentage of gene pairs in a metabolic pathway with significant expression correlation, we found that many genes in the same pathway do not have similar transcript profiles and the choice of dataset, annotation quality, gene function, expression similarity measure, and clustering approach significantly impacts the ability to recover functional associations between genes using Arabidopsis thaliana as an example. Some datasets are more informative in capturing coordinated expression profiles and larger data sets are not always better. In addition, to recover the maximum number of known pathways and identify candidate genes with similar functions, it is important to explore rather exhaustively multiple dataset combinations, similarity measures, clustering algorithms and parameters. Finally, we validated the biological relevance of co-expression cluster memberships with an independent phenomics dataset and found that genes that consistently cluster with leucine degradation genes tend to have similar leucine levels in mutants. This study provides a framework for obtaining gene functional associations by maximizing the information that can be obtained from gene expression datasets. Public Library of Science 2016-12-09 /pmc/articles/PMC5147789/ /pubmed/27935950 http://dx.doi.org/10.1371/journal.pcbi.1005244 Text en © 2016 Uygun et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Uygun, Sahra Peng, Cheng Lehti-Shiu, Melissa D. Last, Robert L. Shiu, Shin-Han Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title | Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title_full | Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title_fullStr | Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title_full_unstemmed | Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title_short | Utility and Limitations of Using Gene Expression Data to Identify Functional Associations |
title_sort | utility and limitations of using gene expression data to identify functional associations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5147789/ https://www.ncbi.nlm.nih.gov/pubmed/27935950 http://dx.doi.org/10.1371/journal.pcbi.1005244 |
work_keys_str_mv | AT uygunsahra utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations AT pengcheng utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations AT lehtishiumelissad utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations AT lastrobertl utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations AT shiushinhan utilityandlimitationsofusinggeneexpressiondatatoidentifyfunctionalassociations |