Cargando…
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptua...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/ https://www.ncbi.nlm.nih.gov/pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3 |
_version_ | 1782132600223039488 |
---|---|
author | Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam |
author_facet | Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam |
author_sort | Bajic, Vladimir B |
collection | PubMed |
description | BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment. |
format | Text |
id | pubmed-1810552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18105522007-03-07 Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam Genome Biol Review BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810552/ /pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3 Text en Copyright © 2006 BioMed Central Ltd. |
spellingShingle | Review Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title | Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title_full | Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title_fullStr | Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title_full_unstemmed | Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title_short | Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment |
title_sort | performance assessment of promoter predictions on encode regions in the egasp experiment |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/ https://www.ncbi.nlm.nih.gov/pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3 |
work_keys_str_mv | AT bajicvladimirb performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT brentmichaelr performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT brownrandallh performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT frankishadam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT harrowjennifer performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT ohleruwe performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT solovyevvictorv performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT tansinlam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment |