Cargando…

Similarity-Based Segmentation of Multi-Dimensional Signals

The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes...

Descripción completa

Detalles Bibliográficos
Autores principales: Machné, Rainer, Murray, Douglas B., Stadler, Peter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617875/
https://www.ncbi.nlm.nih.gov/pubmed/28955039
http://dx.doi.org/10.1038/s41598-017-12401-8
_version_ 1783267059112607744
author Machné, Rainer
Murray, Douglas B.
Stadler, Peter F.
author_facet Machné, Rainer
Murray, Douglas B.
Stadler, Peter F.
author_sort Machné, Rainer
collection PubMed
description The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point.
format Online
Article
Text
id pubmed-5617875
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-56178752017-10-11 Similarity-Based Segmentation of Multi-Dimensional Signals Machné, Rainer Murray, Douglas B. Stadler, Peter F. Sci Rep Article The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point. Nature Publishing Group UK 2017-09-27 /pmc/articles/PMC5617875/ /pubmed/28955039 http://dx.doi.org/10.1038/s41598-017-12401-8 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Machné, Rainer
Murray, Douglas B.
Stadler, Peter F.
Similarity-Based Segmentation of Multi-Dimensional Signals
title Similarity-Based Segmentation of Multi-Dimensional Signals
title_full Similarity-Based Segmentation of Multi-Dimensional Signals
title_fullStr Similarity-Based Segmentation of Multi-Dimensional Signals
title_full_unstemmed Similarity-Based Segmentation of Multi-Dimensional Signals
title_short Similarity-Based Segmentation of Multi-Dimensional Signals
title_sort similarity-based segmentation of multi-dimensional signals
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617875/
https://www.ncbi.nlm.nih.gov/pubmed/28955039
http://dx.doi.org/10.1038/s41598-017-12401-8
work_keys_str_mv AT machnerainer similaritybasedsegmentationofmultidimensionalsignals
AT murraydouglasb similaritybasedsegmentationofmultidimensionalsignals
AT stadlerpeterf similaritybasedsegmentationofmultidimensionalsignals