Cargando…

Expanding the boundaries of local similarity analysis

BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining sign...

Descripción completa

Detalles Bibliográficos
Autores principales:	Durno, W Evan, Hanson, Niels W, Konwar, Kishori M, Hallam, Steven J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549818/ https://www.ncbi.nlm.nih.gov/pubmed/23368516 http://dx.doi.org/10.1186/1471-2164-14-S1-S3

_version_	1782256477115777024
author	Durno, W Evan Hanson, Niels W Konwar, Kishori M Hallam, Steven J
author_facet	Durno, W Evan Hanson, Niels W Konwar, Kishori M Hallam, Steven J
author_sort	Durno, W Evan
collection	PubMed
description	BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm(2)n) to O(m(2)n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.
format	Online Article Text
id	pubmed-3549818
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35498182013-01-23 Expanding the boundaries of local similarity analysis Durno, W Evan Hanson, Niels W Konwar, Kishori M Hallam, Steven J BMC Genomics Proceedings BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm(2)n) to O(m(2)n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/. BioMed Central 2013-01-21 /pmc/articles/PMC3549818/ /pubmed/23368516 http://dx.doi.org/10.1186/1471-2164-14-S1-S3 Text en Copyright ©2013 Durno et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Durno, W Evan Hanson, Niels W Konwar, Kishori M Hallam, Steven J Expanding the boundaries of local similarity analysis
title	Expanding the boundaries of local similarity analysis
title_full	Expanding the boundaries of local similarity analysis
title_fullStr	Expanding the boundaries of local similarity analysis
title_full_unstemmed	Expanding the boundaries of local similarity analysis
title_short	Expanding the boundaries of local similarity analysis
title_sort	expanding the boundaries of local similarity analysis
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549818/ https://www.ncbi.nlm.nih.gov/pubmed/23368516 http://dx.doi.org/10.1186/1471-2164-14-S1-S3
work_keys_str_mv	AT durnowevan expandingtheboundariesoflocalsimilarityanalysis AT hansonnielsw expandingtheboundariesoflocalsimilarityanalysis AT konwarkishorim expandingtheboundariesoflocalsimilarityanalysis AT hallamstevenj expandingtheboundariesoflocalsimilarityanalysis

Expanding the boundaries of local similarity analysis

Ejemplares similares