Cargando…

Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species

The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a kn...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Wen-Lin, Tung, Chun-Wei, Liaw, Chyn, Huang, Hui-Ling, Ho, Shinn-Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927563/
https://www.ncbi.nlm.nih.gov/pubmed/24955394
http://dx.doi.org/10.1155/2014/327306
_version_ 1782304141549240320
author Huang, Wen-Lin
Tung, Chun-Wei
Liaw, Chyn
Huang, Hui-Ling
Ho, Shinn-Ying
author_facet Huang, Wen-Lin
Tung, Chun-Wei
Liaw, Chyn
Huang, Hui-Ling
Ho, Shinn-Ying
author_sort Huang, Wen-Lin
collection PubMed
description The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.
format Online
Article
Text
id pubmed-3927563
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-39275632014-06-22 Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species Huang, Wen-Lin Tung, Chun-Wei Liaw, Chyn Huang, Hui-Ling Ho, Shinn-Ying ScientificWorldJournal Research Article The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively. Hindawi Publishing Corporation 2014 2014-01-29 /pmc/articles/PMC3927563/ /pubmed/24955394 http://dx.doi.org/10.1155/2014/327306 Text en Copyright © 2014 Wen-Lin Huang et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Wen-Lin
Tung, Chun-Wei
Liaw, Chyn
Huang, Hui-Ling
Ho, Shinn-Ying
Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title_full Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title_fullStr Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title_full_unstemmed Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title_short Rule-Based Knowledge Acquisition Method for Promoter Prediction in Human and Drosophila Species
title_sort rule-based knowledge acquisition method for promoter prediction in human and drosophila species
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3927563/
https://www.ncbi.nlm.nih.gov/pubmed/24955394
http://dx.doi.org/10.1155/2014/327306
work_keys_str_mv AT huangwenlin rulebasedknowledgeacquisitionmethodforpromoterpredictioninhumananddrosophilaspecies
AT tungchunwei rulebasedknowledgeacquisitionmethodforpromoterpredictioninhumananddrosophilaspecies
AT liawchyn rulebasedknowledgeacquisitionmethodforpromoterpredictioninhumananddrosophilaspecies
AT huanghuiling rulebasedknowledgeacquisitionmethodforpromoterpredictioninhumananddrosophilaspecies
AT hoshinnying rulebasedknowledgeacquisitionmethodforpromoterpredictioninhumananddrosophilaspecies