Cargando…

A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region

An important step in understanding gene regulation is to identify the promoter regions where the transcription factor binding takes place. Predicting a promoter region de novo has been a theoretical goal for many researchers for a long time. There exists a number of in silico methods to predict the...

Descripción completa

Detalles Bibliográficos
Autores principales: Datta, Sutapa, Mukhopadhyay, Subhasis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577817/
https://www.ncbi.nlm.nih.gov/pubmed/23437045
http://dx.doi.org/10.1371/journal.pone.0054843
_version_ 1782259979579817984
author Datta, Sutapa
Mukhopadhyay, Subhasis
author_facet Datta, Sutapa
Mukhopadhyay, Subhasis
author_sort Datta, Sutapa
collection PubMed
description An important step in understanding gene regulation is to identify the promoter regions where the transcription factor binding takes place. Predicting a promoter region de novo has been a theoretical goal for many researchers for a long time. There exists a number of in silico methods to predict the promoter region de novo but most of these methods are still suffering from various shortcomings, a major one being the selection of appropriate features of promoter region distinguishing them from non-promoters. In this communication, we have proposed a new composite method that predicts promoter sequences based on the interrelationship between structural profiles of DNA and primary sequence elements of the promoter regions. We have shown that a Context Free Grammar (CFG) can formalize the relationships between different primary sequence features and by utilizing the CFG, we demonstrate that an efficient parser can be constructed for extracting these relationships from DNA sequences to distinguish the true promoter sequences from non-promoter sequences. Along with CFG, we have extracted the structural features of the promoter region to improve upon the efficiency of our prediction system. Extensive experiments performed on different datasets reveals that our method is effective in predicting promoter sequences on a genome-wide scale and performs satisfactorily as compared to other promoter prediction techniques.
format Online
Article
Text
id pubmed-3577817
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35778172013-02-22 A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region Datta, Sutapa Mukhopadhyay, Subhasis PLoS One Research Article An important step in understanding gene regulation is to identify the promoter regions where the transcription factor binding takes place. Predicting a promoter region de novo has been a theoretical goal for many researchers for a long time. There exists a number of in silico methods to predict the promoter region de novo but most of these methods are still suffering from various shortcomings, a major one being the selection of appropriate features of promoter region distinguishing them from non-promoters. In this communication, we have proposed a new composite method that predicts promoter sequences based on the interrelationship between structural profiles of DNA and primary sequence elements of the promoter regions. We have shown that a Context Free Grammar (CFG) can formalize the relationships between different primary sequence features and by utilizing the CFG, we demonstrate that an efficient parser can be constructed for extracting these relationships from DNA sequences to distinguish the true promoter sequences from non-promoter sequences. Along with CFG, we have extracted the structural features of the promoter region to improve upon the efficiency of our prediction system. Extensive experiments performed on different datasets reveals that our method is effective in predicting promoter sequences on a genome-wide scale and performs satisfactorily as compared to other promoter prediction techniques. Public Library of Science 2013-02-20 /pmc/articles/PMC3577817/ /pubmed/23437045 http://dx.doi.org/10.1371/journal.pone.0054843 Text en © 2013 Datta, Mukhopadhyay http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Datta, Sutapa
Mukhopadhyay, Subhasis
A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title_full A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title_fullStr A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title_full_unstemmed A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title_short A Composite Method Based on Formal Grammar and DNA Structural Features in Detecting Human Polymerase II Promoter Region
title_sort composite method based on formal grammar and dna structural features in detecting human polymerase ii promoter region
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577817/
https://www.ncbi.nlm.nih.gov/pubmed/23437045
http://dx.doi.org/10.1371/journal.pone.0054843
work_keys_str_mv AT dattasutapa acompositemethodbasedonformalgrammaranddnastructuralfeaturesindetectinghumanpolymeraseiipromoterregion
AT mukhopadhyaysubhasis acompositemethodbasedonformalgrammaranddnastructuralfeaturesindetectinghumanpolymeraseiipromoterregion
AT dattasutapa compositemethodbasedonformalgrammaranddnastructuralfeaturesindetectinghumanpolymeraseiipromoterregion
AT mukhopadhyaysubhasis compositemethodbasedonformalgrammaranddnastructuralfeaturesindetectinghumanpolymeraseiipromoterregion