Cargando…

Large-scale motif discovery using DNA Gray code and equiprobable oligomers

Motivation: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other met...

Descripción completa

Detalles Bibliográficos
Autores principales: Ichinose, Natsuhiro, Yada, Tetsushi, Gotoh, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244767/
https://www.ncbi.nlm.nih.gov/pubmed/22057160
http://dx.doi.org/10.1093/bioinformatics/btr606
_version_ 1782219754600136704
author Ichinose, Natsuhiro
Yada, Tetsushi
Gotoh, Osamu
author_facet Ichinose, Natsuhiro
Yada, Tetsushi
Gotoh, Osamu
author_sort Ichinose, Natsuhiro
collection PubMed
description Motivation: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. Results: We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. Availability: The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ Contact: ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp
format Online
Article
Text
id pubmed-3244767
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-32447672011-12-22 Large-scale motif discovery using DNA Gray code and equiprobable oligomers Ichinose, Natsuhiro Yada, Tetsushi Gotoh, Osamu Bioinformatics Original Papers Motivation: How to find motifs from genome-scale functional sequences, such as all the promoters in a genome, is a challenging problem. Word-based methods count the occurrences of oligomers to detect excessively represented ones. This approach is known to be fast and accurate compared with other methods. However, two problems have hampered the application of such methods to large-scale data. One is the computational cost necessary for clustering similar oligomers, and the other is the bias in the frequency of fixed-length oligomers, which complicates the detection of significant words. Results: We introduce a method that uses a DNA Gray code and equiprobable oligomers, which solve the clustering problem and the oligomer bias, respectively. Our method can analyze 18 000 sequences of ~1 kbp long in 30 s. We also show that the accuracy of our method is superior to that of a leading method, especially for large-scale data and small fractions of motif-containing sequences. Availability: The online and stand-alone versions of the application, named Hegma, are available at our website: http://www.genome.ist.i.kyoto-u.ac.jp/~ichinose/hegma/ Contact: ichinose@i.kyoto-u.ac.jp; o.gotoh@i.kyoto-u.ac.jp Oxford University Press 2012-01-01 2011-11-03 /pmc/articles/PMC3244767/ /pubmed/22057160 http://dx.doi.org/10.1093/bioinformatics/btr606 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Ichinose, Natsuhiro
Yada, Tetsushi
Gotoh, Osamu
Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title_full Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title_fullStr Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title_full_unstemmed Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title_short Large-scale motif discovery using DNA Gray code and equiprobable oligomers
title_sort large-scale motif discovery using dna gray code and equiprobable oligomers
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244767/
https://www.ncbi.nlm.nih.gov/pubmed/22057160
http://dx.doi.org/10.1093/bioinformatics/btr606
work_keys_str_mv AT ichinosenatsuhiro largescalemotifdiscoveryusingdnagraycodeandequiprobableoligomers
AT yadatetsushi largescalemotifdiscoveryusingdnagraycodeandequiprobableoligomers
AT gotohosamu largescalemotifdiscoveryusingdnagraycodeandequiprobableoligomers