Cargando…

AceView: a comprehensive cDNA-supported gene and transcripts annotation

BACKGROUND: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and...

Descripción completa

Detalles Bibliográficos
Autores principales: Thierry-Mieg, Danielle, Thierry-Mieg, Jean
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810549/
https://www.ncbi.nlm.nih.gov/pubmed/16925834
http://dx.doi.org/10.1186/gb-2006-7-s1-s12
_version_ 1782132599298195456
author Thierry-Mieg, Danielle
Thierry-Mieg, Jean
author_facet Thierry-Mieg, Danielle
Thierry-Mieg, Jean
author_sort Thierry-Mieg, Danielle
collection PubMed
description BACKGROUND: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants. RESULTS: We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode. CONCLUSION: Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%).
format Text
id pubmed-1810549
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18105492007-03-07 AceView: a comprehensive cDNA-supported gene and transcripts annotation Thierry-Mieg, Danielle Thierry-Mieg, Jean Genome Biol Research BACKGROUND: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants. RESULTS: We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode. CONCLUSION: Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%). BioMed Central 2006 2006-08-07 /pmc/articles/PMC1810549/ /pubmed/16925834 http://dx.doi.org/10.1186/gb-2006-7-s1-s12 Text en Copyright © 2006 Thierry-Mieg and Thierry-Mieg; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Thierry-Mieg, Danielle
Thierry-Mieg, Jean
AceView: a comprehensive cDNA-supported gene and transcripts annotation
title AceView: a comprehensive cDNA-supported gene and transcripts annotation
title_full AceView: a comprehensive cDNA-supported gene and transcripts annotation
title_fullStr AceView: a comprehensive cDNA-supported gene and transcripts annotation
title_full_unstemmed AceView: a comprehensive cDNA-supported gene and transcripts annotation
title_short AceView: a comprehensive cDNA-supported gene and transcripts annotation
title_sort aceview: a comprehensive cdna-supported gene and transcripts annotation
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810549/
https://www.ncbi.nlm.nih.gov/pubmed/16925834
http://dx.doi.org/10.1186/gb-2006-7-s1-s12
work_keys_str_mv AT thierrymiegdanielle aceviewacomprehensivecdnasupportedgeneandtranscriptsannotation
AT thierrymiegjean aceviewacomprehensivecdnasupportedgeneandtranscriptsannotation