Cargando…

EGASP: the human ENCODE Genome Annotation Assessment Project

BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protei...

Descripción completa

Detalles Bibliográficos
Autores principales: Guigó, Roderic, Flicek, Paul, Abril, Josep F, Reymond, Alexandre, Lagarde, Julien, Denoeud, France, Antonarakis, Stylianos, Ashburner, Michael, Bajic, Vladimir B, Birney, Ewan, Castelo, Robert, Eyras, Eduardo, Ucla, Catherine, Gingeras, Thomas R, Harrow, Jennifer, Hubbard, Tim, Lewis, Suzanna E, Reese, Martin G
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810551/
https://www.ncbi.nlm.nih.gov/pubmed/16925836
http://dx.doi.org/10.1186/gb-2006-7-s1-s2
_version_ 1782132599964041216
author Guigó, Roderic
Flicek, Paul
Abril, Josep F
Reymond, Alexandre
Lagarde, Julien
Denoeud, France
Antonarakis, Stylianos
Ashburner, Michael
Bajic, Vladimir B
Birney, Ewan
Castelo, Robert
Eyras, Eduardo
Ucla, Catherine
Gingeras, Thomas R
Harrow, Jennifer
Hubbard, Tim
Lewis, Suzanna E
Reese, Martin G
author_facet Guigó, Roderic
Flicek, Paul
Abril, Josep F
Reymond, Alexandre
Lagarde, Julien
Denoeud, France
Antonarakis, Stylianos
Ashburner, Michael
Bajic, Vladimir B
Birney, Ewan
Castelo, Robert
Eyras, Eduardo
Ucla, Catherine
Gingeras, Thomas R
Harrow, Jennifer
Hubbard, Tim
Lewis, Suzanna E
Reese, Martin G
author_sort Guigó, Roderic
collection PubMed
description BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
format Text
id pubmed-1810551
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105512007-03-07 EGASP: the human ENCODE Genome Annotation Assessment Project Guigó, Roderic Flicek, Paul Abril, Josep F Reymond, Alexandre Lagarde, Julien Denoeud, France Antonarakis, Stylianos Ashburner, Michael Bajic, Vladimir B Birney, Ewan Castelo, Robert Eyras, Eduardo Ucla, Catherine Gingeras, Thomas R Harrow, Jennifer Hubbard, Tim Lewis, Suzanna E Reese, Martin G Genome Biol Review BACKGROUND: We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. RESULTS: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. CONCLUSION: This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810551/ /pubmed/16925836 http://dx.doi.org/10.1186/gb-2006-7-s1-s2 Text en Copyright © 2006 BioMed Central Ltd.
spellingShingle Review
Guigó, Roderic
Flicek, Paul
Abril, Josep F
Reymond, Alexandre
Lagarde, Julien
Denoeud, France
Antonarakis, Stylianos
Ashburner, Michael
Bajic, Vladimir B
Birney, Ewan
Castelo, Robert
Eyras, Eduardo
Ucla, Catherine
Gingeras, Thomas R
Harrow, Jennifer
Hubbard, Tim
Lewis, Suzanna E
Reese, Martin G
EGASP: the human ENCODE Genome Annotation Assessment Project
title EGASP: the human ENCODE Genome Annotation Assessment Project
title_full EGASP: the human ENCODE Genome Annotation Assessment Project
title_fullStr EGASP: the human ENCODE Genome Annotation Assessment Project
title_full_unstemmed EGASP: the human ENCODE Genome Annotation Assessment Project
title_short EGASP: the human ENCODE Genome Annotation Assessment Project
title_sort egasp: the human encode genome annotation assessment project
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810551/
https://www.ncbi.nlm.nih.gov/pubmed/16925836
http://dx.doi.org/10.1186/gb-2006-7-s1-s2
work_keys_str_mv AT guigoroderic egaspthehumanencodegenomeannotationassessmentproject
AT flicekpaul egaspthehumanencodegenomeannotationassessmentproject
AT abriljosepf egaspthehumanencodegenomeannotationassessmentproject
AT reymondalexandre egaspthehumanencodegenomeannotationassessmentproject
AT lagardejulien egaspthehumanencodegenomeannotationassessmentproject
AT denoeudfrance egaspthehumanencodegenomeannotationassessmentproject
AT antonarakisstylianos egaspthehumanencodegenomeannotationassessmentproject
AT ashburnermichael egaspthehumanencodegenomeannotationassessmentproject
AT bajicvladimirb egaspthehumanencodegenomeannotationassessmentproject
AT birneyewan egaspthehumanencodegenomeannotationassessmentproject
AT castelorobert egaspthehumanencodegenomeannotationassessmentproject
AT eyraseduardo egaspthehumanencodegenomeannotationassessmentproject
AT uclacatherine egaspthehumanencodegenomeannotationassessmentproject
AT gingerasthomasr egaspthehumanencodegenomeannotationassessmentproject
AT harrowjennifer egaspthehumanencodegenomeannotationassessmentproject
AT hubbardtim egaspthehumanencodegenomeannotationassessmentproject
AT lewissuzannae egaspthehumanencodegenomeannotationassessmentproject
AT reesemarting egaspthehumanencodegenomeannotationassessmentproject