Cargando…

Statistical significance approximation for local similarity analysis of dependent time series data

BACKGROUND: Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) sc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Fang, Sun, Fengzhu, Luan, Yihui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348690/ https://www.ncbi.nlm.nih.gov/pubmed/30691412 http://dx.doi.org/10.1186/s12859-019-2595-x

_version_	1783390146726461440
author	Zhang, Fang Sun, Fengzhu Luan, Yihui
author_facet	Zhang, Fang Sun, Fengzhu Luan, Yihui
author_sort	Zhang, Fang
collection	PubMed
description	BACKGROUND: Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. RESULTS: In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled. CONCLUSIONS: Our methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2595-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6348690
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63486902019-01-31 Statistical significance approximation for local similarity analysis of dependent time series data Zhang, Fang Sun, Fengzhu Luan, Yihui BMC Bioinformatics Methodology Article BACKGROUND: Local similarity analysis (LSA) of time series data has been extensively used to investigate the dynamics of biological systems in a wide range of environments. Recently, a theoretical method was proposed to approximately calculate the statistical significance of local similarity (LS) scores. However, the method assumes that the time series data are independent identically distributed, which can be violated in many problems. RESULTS: In this paper, we develop a novel approach to accurately approximate statistical significance of LSA for dependent time series data using nonparametric kernel estimated long-run variance. We also investigate an alternative method for LSA statistical significance approximation by computing the local similarity score of the residuals based on a predefined statistical model. We show by simulations that both methods have controllable type I errors for dependent time series, while other approaches for statistical significance can be grossly oversized. We apply both methods to human and marine microbial datasets, where most of possible significant associations are captured and false positives are efficiently controlled. CONCLUSIONS: Our methods provide fast and effective approaches for evaluating statistical significance of dependent time series data with controllable type I error. They can be applied to a variety of time series data to reveal inherent relationships among the different factors. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2595-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-28 /pmc/articles/PMC6348690/ /pubmed/30691412 http://dx.doi.org/10.1186/s12859-019-2595-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Zhang, Fang Sun, Fengzhu Luan, Yihui Statistical significance approximation for local similarity analysis of dependent time series data
title	Statistical significance approximation for local similarity analysis of dependent time series data
title_full	Statistical significance approximation for local similarity analysis of dependent time series data
title_fullStr	Statistical significance approximation for local similarity analysis of dependent time series data
title_full_unstemmed	Statistical significance approximation for local similarity analysis of dependent time series data
title_short	Statistical significance approximation for local similarity analysis of dependent time series data
title_sort	statistical significance approximation for local similarity analysis of dependent time series data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348690/ https://www.ncbi.nlm.nih.gov/pubmed/30691412 http://dx.doi.org/10.1186/s12859-019-2595-x
work_keys_str_mv	AT zhangfang statisticalsignificanceapproximationforlocalsimilarityanalysisofdependenttimeseriesdata AT sunfengzhu statisticalsignificanceapproximationforlocalsimilarityanalysisofdependenttimeseriesdata AT luanyihui statisticalsignificanceapproximationforlocalsimilarityanalysisofdependenttimeseriesdata

Statistical significance approximation for local similarity analysis of dependent time series data

Ejemplares similares