Cargando…

Prediction-based approaches to characterize bidirectional promoters in the mammalian genome

BACKGROUND: Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Mary Qu, Elnitski, Laura L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386062/
https://www.ncbi.nlm.nih.gov/pubmed/18366609
http://dx.doi.org/10.1186/1471-2164-9-S1-S2
_version_ 1782155203443687424
author Yang, Mary Qu
Elnitski, Laura L
author_facet Yang, Mary Qu
Elnitski, Laura L
author_sort Yang, Mary Qu
collection PubMed
description BACKGROUND: Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA. RESULTS: We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands. CONCLUSIONS: We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.
format Text
id pubmed-2386062
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23860622008-05-15 Prediction-based approaches to characterize bidirectional promoters in the mammalian genome Yang, Mary Qu Elnitski, Laura L BMC Genomics Research BACKGROUND: Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA. RESULTS: We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands. CONCLUSIONS: We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions. BioMed Central 2008-03-20 /pmc/articles/PMC2386062/ /pubmed/18366609 http://dx.doi.org/10.1186/1471-2164-9-S1-S2 Text en Copyright © 2008 Yang and Elnitski; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Yang, Mary Qu
Elnitski, Laura L
Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title_full Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title_fullStr Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title_full_unstemmed Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title_short Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
title_sort prediction-based approaches to characterize bidirectional promoters in the mammalian genome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2386062/
https://www.ncbi.nlm.nih.gov/pubmed/18366609
http://dx.doi.org/10.1186/1471-2164-9-S1-S2
work_keys_str_mv AT yangmaryqu predictionbasedapproachestocharacterizebidirectionalpromotersinthemammaliangenome
AT elnitskilaural predictionbasedapproachestocharacterizebidirectionalpromotersinthemammaliangenome