Cargando…
EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences
BACKGROUND: Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low....
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1539026/ https://www.ncbi.nlm.nih.gov/pubmed/16839417 http://dx.doi.org/10.1186/1471-2105-7-342 |
_version_ | 1782129161717940224 |
---|---|
author | Hu, Jianjun Yang, Yifeng D Kihara, Daisuke |
author_facet | Hu, Jianjun Yang, Yifeng D Kihara, Daisuke |
author_sort | Hu, Jianjun |
collection | PubMed |
description | BACKGROUND: Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms. RESULTS: We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from E. coli RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences. CONCLUSION: We proposed an ensemble approach for the motif discovery problem by taking advantage of the availability of a large number of motif discovery programs. We have shown that the ensemble approach is an effective strategy for improving both sensitivity and specificity, thus the accuracy of the prediction. The advantage of the EMD algorithm is its flexibility in the sense that a new powerful algorithm can be easily added to the system. |
format | Text |
id | pubmed-1539026 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-15390262006-08-14 EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences Hu, Jianjun Yang, Yifeng D Kihara, Daisuke BMC Bioinformatics Research Article BACKGROUND: Understanding gene regulatory networks has become one of the central research problems in bioinformatics. More than thirty algorithms have been proposed to identify DNA regulatory sites during the past thirty years. However, the prediction accuracy of these algorithms is still quite low. Ensemble algorithms have emerged as an effective strategy in bioinformatics for improving the prediction accuracy by exploiting the synergetic prediction capability of multiple algorithms. RESULTS: We proposed a novel clustering-based ensemble algorithm named EMD for de novo motif discovery by combining multiple predictions from multiple runs of one or more base component algorithms. The ensemble approach is applied to the motif discovery problem for the first time. The algorithm is tested on a benchmark dataset generated from E. coli RegulonDB. The EMD algorithm has achieved 22.4% improvement in terms of the nucleotide level prediction accuracy over the best stand-alone component algorithm. The advantage of the EMD algorithm is more significant for shorter input sequences, but most importantly, it always outperforms or at least stays at the same performance level of the stand-alone component algorithms even for longer sequences. CONCLUSION: We proposed an ensemble approach for the motif discovery problem by taking advantage of the availability of a large number of motif discovery programs. We have shown that the ensemble approach is an effective strategy for improving both sensitivity and specificity, thus the accuracy of the prediction. The advantage of the EMD algorithm is its flexibility in the sense that a new powerful algorithm can be easily added to the system. BioMed Central 2006-07-13 /pmc/articles/PMC1539026/ /pubmed/16839417 http://dx.doi.org/10.1186/1471-2105-7-342 Text en Copyright © 2006 Hu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Hu, Jianjun Yang, Yifeng D Kihara, Daisuke EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title | EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title_full | EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title_fullStr | EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title_full_unstemmed | EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title_short | EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences |
title_sort | emd: an ensemble algorithm for discovering regulatory motifs in dna sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1539026/ https://www.ncbi.nlm.nih.gov/pubmed/16839417 http://dx.doi.org/10.1186/1471-2105-7-342 |
work_keys_str_mv | AT hujianjun emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences AT yangyifengd emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences AT kiharadaisuke emdanensemblealgorithmfordiscoveringregulatorymotifsindnasequences |