Cargando…

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment

BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptua...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bajic, Vladimir B, Brent, Michael R, Brown, Randall H, Frankish, Adam, Harrow, Jennifer, Ohler, Uwe, Solovyev, Victor V, Tan, Sin Lam
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/ https://www.ncbi.nlm.nih.gov/pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3

_version_	1782132600223039488
author	Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam
author_facet	Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam
author_sort	Bajic, Vladimir B
collection	PubMed
description	BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.
format	Text
id	pubmed-1810552
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18105522007-03-07 Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam Genome Biol Review BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810552/ /pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3 Text en Copyright © 2006 BioMed Central Ltd.
spellingShingle	Review Bajic, Vladimir B Brent, Michael R Brown, Randall H Frankish, Adam Harrow, Jennifer Ohler, Uwe Solovyev, Victor V Tan, Sin Lam Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title	Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_full	Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_fullStr	Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_full_unstemmed	Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_short	Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment
title_sort	performance assessment of promoter predictions on encode regions in the egasp experiment
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810552/ https://www.ncbi.nlm.nih.gov/pubmed/16925837 http://dx.doi.org/10.1186/gb-2006-7-s1-s3
work_keys_str_mv	AT bajicvladimirb performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT brentmichaelr performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT brownrandallh performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT frankishadam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT harrowjennifer performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT ohleruwe performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT solovyevvictorv performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment AT tansinlam performanceassessmentofpromoterpredictionsonencoderegionsintheegaspexperiment

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment

Ejemplares similares