Cargando…
Similarity-Based Segmentation of Multi-Dimensional Signals
The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617875/ https://www.ncbi.nlm.nih.gov/pubmed/28955039 http://dx.doi.org/10.1038/s41598-017-12401-8 |
_version_ | 1783267059112607744 |
---|---|
author | Machné, Rainer Murray, Douglas B. Stadler, Peter F. |
author_facet | Machné, Rainer Murray, Douglas B. Stadler, Peter F. |
author_sort | Machné, Rainer |
collection | PubMed |
description | The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point. |
format | Online Article Text |
id | pubmed-5617875 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-56178752017-10-11 Similarity-Based Segmentation of Multi-Dimensional Signals Machné, Rainer Murray, Douglas B. Stadler, Peter F. Sci Rep Article The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point. Nature Publishing Group UK 2017-09-27 /pmc/articles/PMC5617875/ /pubmed/28955039 http://dx.doi.org/10.1038/s41598-017-12401-8 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Machné, Rainer Murray, Douglas B. Stadler, Peter F. Similarity-Based Segmentation of Multi-Dimensional Signals |
title | Similarity-Based Segmentation of Multi-Dimensional Signals |
title_full | Similarity-Based Segmentation of Multi-Dimensional Signals |
title_fullStr | Similarity-Based Segmentation of Multi-Dimensional Signals |
title_full_unstemmed | Similarity-Based Segmentation of Multi-Dimensional Signals |
title_short | Similarity-Based Segmentation of Multi-Dimensional Signals |
title_sort | similarity-based segmentation of multi-dimensional signals |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617875/ https://www.ncbi.nlm.nih.gov/pubmed/28955039 http://dx.doi.org/10.1038/s41598-017-12401-8 |
work_keys_str_mv | AT machnerainer similaritybasedsegmentationofmultidimensionalsignals AT murraydouglasb similaritybasedsegmentationofmultidimensionalsignals AT stadlerpeterf similaritybasedsegmentationofmultidimensionalsignals |