Cargando…

ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets

Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-leve...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rydbeck, Halfdan, Sandve, Geir Kjetil, Ferkingstad, Egil, Simovski, Boris, Rye, Morten, Hovig, Eivind
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4400084/ https://www.ncbi.nlm.nih.gov/pubmed/25879845 http://dx.doi.org/10.1371/journal.pone.0123261

_version_	1782366991584067584
author	Rydbeck, Halfdan Sandve, Geir Kjetil Ferkingstad, Egil Simovski, Boris Rye, Morten Hovig, Eivind
author_facet	Rydbeck, Halfdan Sandve, Geir Kjetil Ferkingstad, Egil Simovski, Boris Rye, Morten Hovig, Eivind
author_sort	Rydbeck, Halfdan
collection	PubMed
description	Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.
format	Online Article Text
id	pubmed-4400084
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-44000842015-04-21 ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets Rydbeck, Halfdan Sandve, Geir Kjetil Ferkingstad, Egil Simovski, Boris Rye, Morten Hovig, Eivind PLoS One Research Article Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/. Public Library of Science 2015-04-16 /pmc/articles/PMC4400084/ /pubmed/25879845 http://dx.doi.org/10.1371/journal.pone.0123261 Text en © 2015 Rydbeck et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Rydbeck, Halfdan Sandve, Geir Kjetil Ferkingstad, Egil Simovski, Boris Rye, Morten Hovig, Eivind ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title	ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title_full	ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title_fullStr	ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title_full_unstemmed	ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title_short	ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets
title_sort	clustrack: feature extraction and similarity measures for clustering of genome-wide data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4400084/ https://www.ncbi.nlm.nih.gov/pubmed/25879845 http://dx.doi.org/10.1371/journal.pone.0123261
work_keys_str_mv	AT rydbeckhalfdan clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT sandvegeirkjetil clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT ferkingstadegil clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT simovskiboris clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT ryemorten clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets AT hovigeivind clustrackfeatureextractionandsimilaritymeasuresforclusteringofgenomewidedatasets

ClusTrack: Feature Extraction and Similarity Measures for Clustering of Genome-Wide Data Sets

Ejemplares similares