Cargando…

A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets

New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yipu, Wang, Ping
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4509496/ https://www.ncbi.nlm.nih.gov/pubmed/26236718 http://dx.doi.org/10.1155/2015/218068

_version_	1782382040673419264
author	Zhang, Yipu Wang, Ping
author_facet	Zhang, Yipu Wang, Ping
author_sort	Zhang, Yipu
collection	PubMed
description	New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME.
format	Online Article Text
id	pubmed-4509496
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-45094962015-08-02 A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets Zhang, Yipu Wang, Ping Biomed Res Int Research Article New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. Hindawi Publishing Corporation 2015 2015-07-05 /pmc/articles/PMC4509496/ /pubmed/26236718 http://dx.doi.org/10.1155/2015/218068 Text en Copyright © 2015 Y. Zhang and P. Wang. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Zhang, Yipu Wang, Ping A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title	A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title_full	A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title_fullStr	A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title_full_unstemmed	A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title_short	A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets
title_sort	fast cluster motif finding algorithm for chip-seq data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4509496/ https://www.ncbi.nlm.nih.gov/pubmed/26236718 http://dx.doi.org/10.1155/2015/218068
work_keys_str_mv	AT zhangyipu afastclustermotiffindingalgorithmforchipseqdatasets AT wangping afastclustermotiffindingalgorithmforchipseqdatasets AT zhangyipu fastclustermotiffindingalgorithmforchipseqdatasets AT wangping fastclustermotiffindingalgorithmforchipseqdatasets

A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets

Ejemplares similares