Cargando…
Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains
BACKGROUND: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data anal...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578688/ https://www.ncbi.nlm.nih.gov/pubmed/26390921 http://dx.doi.org/10.1186/s12859-015-0732-8 |
_version_ | 1782391153731043328 |
---|---|
author | Xia, Li C. Ai, Dongmei Cram, Jacob A. Liang, Xiaoyi Fuhrman, Jed A. Sun, Fengzhu |
author_facet | Xia, Li C. Ai, Dongmei Cram, Jacob A. Liang, Xiaoyi Fuhrman, Jed A. Sun, Fengzhu |
author_sort | Xia, Li C. |
collection | PubMed |
description | BACKGROUND: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. RESULTS: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. AVAILABILITY: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0732-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4578688 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45786882015-09-23 Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains Xia, Li C. Ai, Dongmei Cram, Jacob A. Liang, Xiaoyi Fuhrman, Jed A. Sun, Fengzhu BMC Bioinformatics Methodology Article BACKGROUND: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. RESULTS: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. AVAILABILITY: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0732-8) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-21 /pmc/articles/PMC4578688/ /pubmed/26390921 http://dx.doi.org/10.1186/s12859-015-0732-8 Text en © Xia et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Xia, Li C. Ai, Dongmei Cram, Jacob A. Liang, Xiaoyi Fuhrman, Jed A. Sun, Fengzhu Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title | Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title_full | Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title_fullStr | Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title_full_unstemmed | Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title_short | Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains |
title_sort | statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of markov chains |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578688/ https://www.ncbi.nlm.nih.gov/pubmed/26390921 http://dx.doi.org/10.1186/s12859-015-0732-8 |
work_keys_str_mv | AT xialic statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains AT aidongmei statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains AT cramjacoba statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains AT liangxiaoyi statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains AT fuhrmanjeda statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains AT sunfengzhu statisticalsignificanceapproximationinlocaltrendanalysisofhighthroughputtimeseriesdatausingthetheoryofmarkovchains |