Cargando…

Limitations and potentials of current motif discovery algorithms

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not su...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Jianjun, Li, Bin, Kihara, Daisuke
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199555/
https://www.ncbi.nlm.nih.gov/pubmed/16284194
http://dx.doi.org/10.1093/nar/gki791
_version_ 1782124868134764544
author Hu, Jianjun
Li, Bin
Kihara, Daisuke
author_facet Hu, Jianjun
Li, Bin
Kihara, Daisuke
author_sort Hu, Jianjun
collection PubMed
description Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them.
format Text
id pubmed-1199555
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-11995552005-09-19 Limitations and potentials of current motif discovery algorithms Hu, Jianjun Li, Bin Kihara, Daisuke Nucleic Acids Res Article Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. Oxford University Press 2005 2005-09-02 /pmc/articles/PMC1199555/ /pubmed/16284194 http://dx.doi.org/10.1093/nar/gki791 Text en © The Author 2005. Published by Oxford University Press. All rights reserved
spellingShingle Article
Hu, Jianjun
Li, Bin
Kihara, Daisuke
Limitations and potentials of current motif discovery algorithms
title Limitations and potentials of current motif discovery algorithms
title_full Limitations and potentials of current motif discovery algorithms
title_fullStr Limitations and potentials of current motif discovery algorithms
title_full_unstemmed Limitations and potentials of current motif discovery algorithms
title_short Limitations and potentials of current motif discovery algorithms
title_sort limitations and potentials of current motif discovery algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199555/
https://www.ncbi.nlm.nih.gov/pubmed/16284194
http://dx.doi.org/10.1093/nar/gki791
work_keys_str_mv AT hujianjun limitationsandpotentialsofcurrentmotifdiscoveryalgorithms
AT libin limitationsandpotentialsofcurrentmotifdiscoveryalgorithms
AT kiharadaisuke limitationsandpotentialsofcurrentmotifdiscoveryalgorithms