Cargando…

CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their simil...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Shaoqiang, Chen, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4972426/
https://www.ncbi.nlm.nih.gov/pubmed/27487245
http://dx.doi.org/10.1371/journal.pone.0160435
_version_ 1782446242638331904
author Zhang, Shaoqiang
Chen, Yong
author_facet Zhang, Shaoqiang
Chen, Yong
author_sort Zhang, Shaoqiang
collection PubMed
description A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html.
format Online
Article
Text
id pubmed-4972426
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49724262016-08-18 CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design Zhang, Shaoqiang Chen, Yong PLoS One Research Article A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. Public Library of Science 2016-08-03 /pmc/articles/PMC4972426/ /pubmed/27487245 http://dx.doi.org/10.1371/journal.pone.0160435 Text en © 2016 Zhang, Chen http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Shaoqiang
Chen, Yong
CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title_full CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title_fullStr CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title_full_unstemmed CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title_short CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design
title_sort climp: clustering motifs via maximal cliques with parallel computing design
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4972426/
https://www.ncbi.nlm.nih.gov/pubmed/27487245
http://dx.doi.org/10.1371/journal.pone.0160435
work_keys_str_mv AT zhangshaoqiang climpclusteringmotifsviamaximalcliqueswithparallelcomputingdesign
AT chenyong climpclusteringmotifsviamaximalcliqueswithparallelcomputingdesign