Cargando…
A skellam model to identify differential patterns of gene expression induced by environmental signals
BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167515/ https://www.ncbi.nlm.nih.gov/pubmed/25199446 http://dx.doi.org/10.1186/1471-2164-15-772 |
_version_ | 1782335432920399872 |
---|---|
author | Jiang, Libo Mao, Ke Wu, Rongling |
author_facet | Jiang, Libo Mao, Ke Wu, Rongling |
author_sort | Jiang, Libo |
collection | PubMed |
description | BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks that have become increasingly important. RESULTS: We proposed and verified a cluster algorithm based on a skellam model for grouping genes into distinct groups based on the pattern of gene expression in response to changing conditions or in different tissues. This algorithm capitalizes on the skellam distribution to capture the count property of RNA-seq data and clusters genes in different environments. A two-stage hierarchical expectation-maximization (EM) algorithm was implemented to estimate the optimal number of groups and mean expression levels of each group across two environments. A procedure was formulated to test whether and how a given group shows a plastic response to environmental changes. The model was used to analyze an RNA-seq dataset measured from reciprocal crosses of early Arabidopsis thaliana embryos that respond differently based on the extent of maternal and paternal genome contributions, from which genes associated with maternal and paternal contributions were identified. Simulation studies were also performed to validate the statistical behavior of the model. CONCLUSIONS: This model is a useful tool for clustering gene expression data by RNA-seq, thus facilitating our understanding of gene functions and networks. |
format | Online Article Text |
id | pubmed-4167515 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41675152014-09-19 A skellam model to identify differential patterns of gene expression induced by environmental signals Jiang, Libo Mao, Ke Wu, Rongling BMC Genomics Methodology Article BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks that have become increasingly important. RESULTS: We proposed and verified a cluster algorithm based on a skellam model for grouping genes into distinct groups based on the pattern of gene expression in response to changing conditions or in different tissues. This algorithm capitalizes on the skellam distribution to capture the count property of RNA-seq data and clusters genes in different environments. A two-stage hierarchical expectation-maximization (EM) algorithm was implemented to estimate the optimal number of groups and mean expression levels of each group across two environments. A procedure was formulated to test whether and how a given group shows a plastic response to environmental changes. The model was used to analyze an RNA-seq dataset measured from reciprocal crosses of early Arabidopsis thaliana embryos that respond differently based on the extent of maternal and paternal genome contributions, from which genes associated with maternal and paternal contributions were identified. Simulation studies were also performed to validate the statistical behavior of the model. CONCLUSIONS: This model is a useful tool for clustering gene expression data by RNA-seq, thus facilitating our understanding of gene functions and networks. BioMed Central 2014-09-08 /pmc/articles/PMC4167515/ /pubmed/25199446 http://dx.doi.org/10.1186/1471-2164-15-772 Text en © Jiang et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Jiang, Libo Mao, Ke Wu, Rongling A skellam model to identify differential patterns of gene expression induced by environmental signals |
title | A skellam model to identify differential patterns of gene expression induced by environmental signals |
title_full | A skellam model to identify differential patterns of gene expression induced by environmental signals |
title_fullStr | A skellam model to identify differential patterns of gene expression induced by environmental signals |
title_full_unstemmed | A skellam model to identify differential patterns of gene expression induced by environmental signals |
title_short | A skellam model to identify differential patterns of gene expression induced by environmental signals |
title_sort | skellam model to identify differential patterns of gene expression induced by environmental signals |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167515/ https://www.ncbi.nlm.nih.gov/pubmed/25199446 http://dx.doi.org/10.1186/1471-2164-15-772 |
work_keys_str_mv | AT jianglibo askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals AT maoke askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals AT wurongling askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals AT jianglibo skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals AT maoke skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals AT wurongling skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals |