Cargando…

Predicting Gene Expression from Sequence: A Reexamination

Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the informatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Yuan, Guo, Lei, Shen, Lei, Liu, Jun S
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2098866/
https://www.ncbi.nlm.nih.gov/pubmed/18052544
http://dx.doi.org/10.1371/journal.pcbi.0030243
_version_ 1782138283651760128
author Yuan, Yuan
Guo, Lei
Shen, Lei
Liu, Jun S
author_facet Yuan, Yuan
Guo, Lei
Shen, Lei
Liu, Jun S
author_sort Yuan, Yuan
collection PubMed
description Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV) procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.
format Text
id pubmed-2098866
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-20988662007-11-29 Predicting Gene Expression from Sequence: A Reexamination Yuan, Yuan Guo, Lei Shen, Lei Liu, Jun S PLoS Comput Biol Research Article Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV) procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%. Public Library of Science 2007-11 2007-11-30 /pmc/articles/PMC2098866/ /pubmed/18052544 http://dx.doi.org/10.1371/journal.pcbi.0030243 Text en © 2007 Yuan et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Yuan, Yuan
Guo, Lei
Shen, Lei
Liu, Jun S
Predicting Gene Expression from Sequence: A Reexamination
title Predicting Gene Expression from Sequence: A Reexamination
title_full Predicting Gene Expression from Sequence: A Reexamination
title_fullStr Predicting Gene Expression from Sequence: A Reexamination
title_full_unstemmed Predicting Gene Expression from Sequence: A Reexamination
title_short Predicting Gene Expression from Sequence: A Reexamination
title_sort predicting gene expression from sequence: a reexamination
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2098866/
https://www.ncbi.nlm.nih.gov/pubmed/18052544
http://dx.doi.org/10.1371/journal.pcbi.0030243
work_keys_str_mv AT yuanyuan predictinggeneexpressionfromsequenceareexamination
AT guolei predictinggeneexpressionfromsequenceareexamination
AT shenlei predictinggeneexpressionfromsequenceareexamination
AT liujuns predictinggeneexpressionfromsequenceareexamination