Cargando…

PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets

Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Qiang, Huo, Hongwei, Feng, Dazheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098105/ https://www.ncbi.nlm.nih.gov/pubmed/27843946 http://dx.doi.org/10.1155/2016/4986707

_version_	1782465717771173888
author	Yu, Qiang Huo, Hongwei Feng, Dazheng
author_facet	Yu, Qiang Huo, Hongwei Feng, Dazheng
author_sort	Yu, Qiang
collection	PubMed
description	Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of l-mers in the input with relatively small Hamming distance. In particular, a method for rapidly extracting pairs of l-mers is designed, which can be used not only for PairMotifChIP, but also for other DNA data mining tasks with the same demand. Experimental results on the simulated data show that the proposed algorithm can find motifs successfully and runs faster than the state-of-the-art motif discovery algorithms. Furthermore, the validity of the proposed algorithm has been verified on real data.
format	Online Article Text
id	pubmed-5098105
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-50981052016-11-14 PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets Yu, Qiang Huo, Hongwei Feng, Dazheng Biomed Res Int Research Article Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of l-mers in the input with relatively small Hamming distance. In particular, a method for rapidly extracting pairs of l-mers is designed, which can be used not only for PairMotifChIP, but also for other DNA data mining tasks with the same demand. Experimental results on the simulated data show that the proposed algorithm can find motifs successfully and runs faster than the state-of-the-art motif discovery algorithms. Furthermore, the validity of the proposed algorithm has been verified on real data. Hindawi Publishing Corporation 2016 2016-10-24 /pmc/articles/PMC5098105/ /pubmed/27843946 http://dx.doi.org/10.1155/2016/4986707 Text en Copyright © 2016 Qiang Yu et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Yu, Qiang Huo, Hongwei Feng, Dazheng PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title	PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title_full	PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title_fullStr	PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title_full_unstemmed	PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title_short	PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets
title_sort	pairmotifchip: a fast algorithm for discovery of patterns conserved in large chip-seq data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098105/ https://www.ncbi.nlm.nih.gov/pubmed/27843946 http://dx.doi.org/10.1155/2016/4986707
work_keys_str_mv	AT yuqiang pairmotifchipafastalgorithmfordiscoveryofpatternsconservedinlargechipseqdatasets AT huohongwei pairmotifchipafastalgorithmfordiscoveryofpatternsconservedinlargechipseqdatasets AT fengdazheng pairmotifchipafastalgorithmfordiscoveryofpatternsconservedinlargechipseqdatasets

PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets

Ejemplares similares