Cargando…

Lag penalized weighted correlation for time series clustering

BACKGROUND: The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or...

Descripción completa

Detalles Bibliográficos
Autores principales: Chandereng, Thevaa, Gitter, Anthony
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6966853/
https://www.ncbi.nlm.nih.gov/pubmed/31948388
http://dx.doi.org/10.1186/s12859-019-3324-1
_version_ 1783488830945361920
author Chandereng, Thevaa
Gitter, Anthony
author_facet Chandereng, Thevaa
Gitter, Anthony
author_sort Chandereng, Thevaa
collection PubMed
description BACKGROUND: The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. RESULTS: We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. CONCLUSIONS: LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWCand CRAN under a MIT license.
format Online
Article
Text
id pubmed-6966853
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69668532020-01-27 Lag penalized weighted correlation for time series clustering Chandereng, Thevaa Gitter, Anthony BMC Bioinformatics Methodology Article BACKGROUND: The similarity or distance measure used for clustering can generate intuitive and interpretable clusters when it is tailored to the unique characteristics of the data. In time series datasets generated with high-throughput biological assays, measurements such as gene expression levels or protein phosphorylation intensities are collected sequentially over time, and the similarity score should capture this special temporal structure. RESULTS: We propose a clustering similarity measure called Lag Penalized Weighted Correlation (LPWC) to group pairs of time series that exhibit closely-related behaviors over time, even if the timing is not perfectly synchronized. LPWC aligns time series profiles to identify common temporal patterns. It down-weights aligned profiles based on the length of the temporal lags that are introduced. We demonstrate the advantages of LPWC versus existing time series and general clustering algorithms. In a simulated dataset based on the biologically-motivated impulse model, LPWC is the only method to recover the true clusters for almost all simulated genes. LPWC also identifies clusters with distinct temporal patterns in our yeast osmotic stress response and axolotl limb regeneration case studies. CONCLUSIONS: LPWC achieves both of its time series clustering goals. It groups time series with correlated changes over time, even if those patterns occur earlier or later in some of the time series. In addition, it refrains from introducing large shifts in time when searching for temporal patterns by applying a lag penalty. The LPWC R package is available at https://github.com/gitter-lab/LPWCand CRAN under a MIT license. BioMed Central 2020-07-17 /pmc/articles/PMC6966853/ /pubmed/31948388 http://dx.doi.org/10.1186/s12859-019-3324-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Chandereng, Thevaa
Gitter, Anthony
Lag penalized weighted correlation for time series clustering
title Lag penalized weighted correlation for time series clustering
title_full Lag penalized weighted correlation for time series clustering
title_fullStr Lag penalized weighted correlation for time series clustering
title_full_unstemmed Lag penalized weighted correlation for time series clustering
title_short Lag penalized weighted correlation for time series clustering
title_sort lag penalized weighted correlation for time series clustering
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6966853/
https://www.ncbi.nlm.nih.gov/pubmed/31948388
http://dx.doi.org/10.1186/s12859-019-3324-1
work_keys_str_mv AT chanderengthevaa lagpenalizedweightedcorrelationfortimeseriesclustering
AT gitteranthony lagpenalizedweightedcorrelationfortimeseriesclustering