Cargando…

CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data

Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-...

Descripción completa

Detalles Bibliográficos
Autores principales: Fidaner, Işık Barış, Cankorur-Cetinkaya, Ayca, Dikicioglu, Duygu, Kirdar, Betul, Cemgil, Ali Taylan, Oliver, Stephen G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734040/
https://www.ncbi.nlm.nih.gov/pubmed/26411869
http://dx.doi.org/10.1093/bioinformatics/btv532
_version_ 1782412880125100032
author Fidaner, Işık Barış
Cankorur-Cetinkaya, Ayca
Dikicioglu, Duygu
Kirdar, Betul
Cemgil, Ali Taylan
Oliver, Stephen G.
author_facet Fidaner, Işık Barış
Cankorur-Cetinkaya, Ayca
Dikicioglu, Duygu
Kirdar, Betul
Cemgil, Ali Taylan
Oliver, Stephen G.
author_sort Fidaner, Işık Barış
collection PubMed
description Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. Results: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. Availability and implementation: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. Contact: sgo24@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4734040
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47340402016-02-02 CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data Fidaner, Işık Barış Cankorur-Cetinkaya, Ayca Dikicioglu, Duygu Kirdar, Betul Cemgil, Ali Taylan Oliver, Stephen G. Bioinformatics Original Papers Motivation: Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use and flexible tools designed specifically for the analysis of time-series datasets. Results: We present a novel statistical application called CLUSTERnGO, which uses a model-based clustering algorithm that fulfils this need. This algorithm involves two components of operation. Component 1 constructs a Bayesian non-parametric model (Infinite Mixture of Piecewise Linear Sequences) and Component 2, which applies a novel clustering methodology (Two-Stage Clustering). The software can also assign biological meaning to the identified clusters using an appropriate ontology. It applies multiple hypothesis testing to report the significance of these enrichments. The algorithm has a four-phase pipeline. The application can be executed using either command-line tools or a user-friendly Graphical User Interface. The latter has been developed to address the needs of both specialist and non-specialist users. We use three diverse test cases to demonstrate the flexibility of the proposed strategy. In all cases, CLUSTERnGO not only outperformed existing algorithms in assigning unique GO term enrichments to the identified clusters, but also revealed novel insights regarding the biological systems examined, which were not uncovered in the original publications. Availability and implementation: The C++ and QT source codes, the GUI applications for Windows, OS X and Linux operating systems and user manual are freely available for download under the GNU GPL v3 license at http://www.cmpe.boun.edu.tr/content/CnG. Contact: sgo24@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-02-01 2015-09-26 /pmc/articles/PMC4734040/ /pubmed/26411869 http://dx.doi.org/10.1093/bioinformatics/btv532 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Fidaner, Işık Barış
Cankorur-Cetinkaya, Ayca
Dikicioglu, Duygu
Kirdar, Betul
Cemgil, Ali Taylan
Oliver, Stephen G.
CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title_full CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title_fullStr CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title_full_unstemmed CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title_short CLUSTERnGO: a user-defined modelling platform for two-stage clustering of time-series data
title_sort clusterngo: a user-defined modelling platform for two-stage clustering of time-series data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734040/
https://www.ncbi.nlm.nih.gov/pubmed/26411869
http://dx.doi.org/10.1093/bioinformatics/btv532
work_keys_str_mv AT fidanerisıkbarıs clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata
AT cankorurcetinkayaayca clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata
AT dikiciogluduygu clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata
AT kirdarbetul clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata
AT cemgilalitaylan clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata
AT oliverstepheng clusterngoauserdefinedmodellingplatformfortwostageclusteringoftimeseriesdata