Cargando…

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment

BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptua...

Descripción completa

Detalles Bibliográficos
Autores principales: Bajic, Vladimir B, Brent, Michael R, Brown, Randall H, Frankish, Adam, Harrow, Jennifer, Ohler, Uwe, Solovyev, Victor V, Tan, Sin Lam
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/
https://www.ncbi.nlm.nih.gov/pubmed/16925837
http://dx.doi.org/10.1186/gb-2006-7-s1-s3
_version_ 1782132600223039488
author Bajic, Vladimir B
Brent, Michael R
Brown, Randall H
Frankish, Adam
Harrow, Jennifer
Ohler, Uwe
Solovyev, Victor V
Tan, Sin Lam
author_facet Bajic, Vladimir B
Brent, Michael R
Brown, Randall H
Frankish, Adam
Harrow, Jennifer
Ohler, Uwe
Solovyev, Victor V
Tan, Sin Lam
author_sort Bajic, Vladimir B
collection PubMed
description BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
format Text
id pubmed-1810552
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105522007-03-07 Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam Genome Biol Review BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810552/ /pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3 Text en Copyright © 2006 BioMed Central Ltd.
spellingShingle Review
Bajic, Vladimir B
Brent, Michael R
Brown, Randall H
Frankish, Adam
Harrow, Jennifer
Ohler, Uwe
Solovyev, Victor V
Tan, Sin Lam
Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_full Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_fullStr Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_full_unstemmed Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_short Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_sort performance assessment of promoter predictions on encode regions in the egasp experiment
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/
https://www.ncbi.nlm.nih.gov/pubmed/16925837
http://dx.doi.org/10.1186/gb-2006-7-s1-s3
work_keys_str_mv AT bajicvladimirb performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT brentmichaelr performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT brownrandallh performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT frankishadam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT harrowjennifer performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT ohleruwe performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT solovyevvictorv performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment
AT tansinlam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment