Cargando…

A skellam model to identify differential patterns of gene expression induced by environmental signals

BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Libo, Mao, Ke, Wu, Rongling
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167515/
https://www.ncbi.nlm.nih.gov/pubmed/25199446
http://dx.doi.org/10.1186/1471-2164-15-772
_version_ 1782335432920399872
author Jiang, Libo
Mao, Ke
Wu, Rongling
author_facet Jiang, Libo
Mao, Ke
Wu, Rongling
author_sort Jiang, Libo
collection PubMed
description BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks that have become increasingly important. RESULTS: We proposed and verified a cluster algorithm based on a skellam model for grouping genes into distinct groups based on the pattern of gene expression in response to changing conditions or in different tissues. This algorithm capitalizes on the skellam distribution to capture the count property of RNA-seq data and clusters genes in different environments. A two-stage hierarchical expectation-maximization (EM) algorithm was implemented to estimate the optimal number of groups and mean expression levels of each group across two environments. A procedure was formulated to test whether and how a given group shows a plastic response to environmental changes. The model was used to analyze an RNA-seq dataset measured from reciprocal crosses of early Arabidopsis thaliana embryos that respond differently based on the extent of maternal and paternal genome contributions, from which genes associated with maternal and paternal contributions were identified. Simulation studies were also performed to validate the statistical behavior of the model. CONCLUSIONS: This model is a useful tool for clustering gene expression data by RNA-seq, thus facilitating our understanding of gene functions and networks.
format Online
Article
Text
id pubmed-4167515
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41675152014-09-19 A skellam model to identify differential patterns of gene expression induced by environmental signals Jiang, Libo Mao, Ke Wu, Rongling BMC Genomics Methodology Article BACKGROUND: RNA-seq, based on deep-sequencing techniques, has been widely employed to precisely measure levels of transcripts and their isoforms expressed under different conditions. However, robust statistical tools used to analyze these complex datasets are lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks that have become increasingly important. RESULTS: We proposed and verified a cluster algorithm based on a skellam model for grouping genes into distinct groups based on the pattern of gene expression in response to changing conditions or in different tissues. This algorithm capitalizes on the skellam distribution to capture the count property of RNA-seq data and clusters genes in different environments. A two-stage hierarchical expectation-maximization (EM) algorithm was implemented to estimate the optimal number of groups and mean expression levels of each group across two environments. A procedure was formulated to test whether and how a given group shows a plastic response to environmental changes. The model was used to analyze an RNA-seq dataset measured from reciprocal crosses of early Arabidopsis thaliana embryos that respond differently based on the extent of maternal and paternal genome contributions, from which genes associated with maternal and paternal contributions were identified. Simulation studies were also performed to validate the statistical behavior of the model. CONCLUSIONS: This model is a useful tool for clustering gene expression data by RNA-seq, thus facilitating our understanding of gene functions and networks. BioMed Central 2014-09-08 /pmc/articles/PMC4167515/ /pubmed/25199446 http://dx.doi.org/10.1186/1471-2164-15-772 Text en © Jiang et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Jiang, Libo
Mao, Ke
Wu, Rongling
A skellam model to identify differential patterns of gene expression induced by environmental signals
title A skellam model to identify differential patterns of gene expression induced by environmental signals
title_full A skellam model to identify differential patterns of gene expression induced by environmental signals
title_fullStr A skellam model to identify differential patterns of gene expression induced by environmental signals
title_full_unstemmed A skellam model to identify differential patterns of gene expression induced by environmental signals
title_short A skellam model to identify differential patterns of gene expression induced by environmental signals
title_sort skellam model to identify differential patterns of gene expression induced by environmental signals
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4167515/
https://www.ncbi.nlm.nih.gov/pubmed/25199446
http://dx.doi.org/10.1186/1471-2164-15-772
work_keys_str_mv AT jianglibo askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals
AT maoke askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals
AT wurongling askellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals
AT jianglibo skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals
AT maoke skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals
AT wurongling skellammodeltoidentifydifferentialpatternsofgeneexpressioninducedbyenvironmentalsignals