Cargando…

A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach

BACKGROUND: Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). Howeve...

Descripción completa

Detalles Bibliográficos
Autores principales: Proux-Wéra, Estelle, Armisén, David, Byrne, Kevin P, Wolfe, Kenneth H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3507789/
https://www.ncbi.nlm.nih.gov/pubmed/22984983
http://dx.doi.org/10.1186/1471-2105-13-237
_version_ 1782251133580869632
author Proux-Wéra, Estelle
Armisén, David
Byrne, Kevin P
Wolfe, Kenneth H
author_facet Proux-Wéra, Estelle
Armisén, David
Byrne, Kevin P
Wolfe, Kenneth H
author_sort Proux-Wéra, Estelle
collection PubMed
description BACKGROUND: Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). However, the annotation of genomes presents a major bottleneck for de novo projects, because it still relies on a process that is largely manual. RESULTS: Here we present the Yeast Genome Annotation Pipeline (YGAP), an automated system designed specifically for new yeast genome sequences lacking transcriptome data. YGAP does automatic de novo annotation, exploiting homology and synteny information from other yeast species stored in the Yeast Gene Order Browser (YGOB) database. The basic premises underlying YGAP's approach are that data from other species already tells us what genes we should expect to find in any particular genomic region and that we should also expect that orthologous genes are likely to have similar intron/exon structures. Additionally, it is able to detect probable frameshift sequencing errors and can propose corrections for them. YGAP searches intelligently for introns, and detects tRNA genes and Ty-like elements. CONCLUSIONS: In tests on Saccharomyces cerevisiae and on the genomes of Naumovozyma castellii and Tetrapisispora blattae newly sequenced with Roche-454 technology, YGAP outperformed another popular annotation program (AUGUSTUS). For S. cerevisiae and N. castellii, 91-93% of YGAP's predicted gene structures were identical to those in previous manually curated gene sets. YGAP has been implemented as a webserver with a user-friendly interface at http://wolfe.gen.tcd.ie/annotation.
format Online
Article
Text
id pubmed-3507789
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35077892012-11-28 A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach Proux-Wéra, Estelle Armisén, David Byrne, Kevin P Wolfe, Kenneth H BMC Bioinformatics Research Article BACKGROUND: Yeasts are a model system for exploring eukaryotic genome evolution. Next-generation sequencing technologies are poised to vastly increase the number of yeast genome sequences, both from resequencing projects (population studies) and from de novo sequencing projects (new species). However, the annotation of genomes presents a major bottleneck for de novo projects, because it still relies on a process that is largely manual. RESULTS: Here we present the Yeast Genome Annotation Pipeline (YGAP), an automated system designed specifically for new yeast genome sequences lacking transcriptome data. YGAP does automatic de novo annotation, exploiting homology and synteny information from other yeast species stored in the Yeast Gene Order Browser (YGOB) database. The basic premises underlying YGAP's approach are that data from other species already tells us what genes we should expect to find in any particular genomic region and that we should also expect that orthologous genes are likely to have similar intron/exon structures. Additionally, it is able to detect probable frameshift sequencing errors and can propose corrections for them. YGAP searches intelligently for introns, and detects tRNA genes and Ty-like elements. CONCLUSIONS: In tests on Saccharomyces cerevisiae and on the genomes of Naumovozyma castellii and Tetrapisispora blattae newly sequenced with Roche-454 technology, YGAP outperformed another popular annotation program (AUGUSTUS). For S. cerevisiae and N. castellii, 91-93% of YGAP's predicted gene structures were identical to those in previous manually curated gene sets. YGAP has been implemented as a webserver with a user-friendly interface at http://wolfe.gen.tcd.ie/annotation. BioMed Central 2012-09-17 /pmc/articles/PMC3507789/ /pubmed/22984983 http://dx.doi.org/10.1186/1471-2105-13-237 Text en Copyright ©2012 Proux-Wéra et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Proux-Wéra, Estelle
Armisén, David
Byrne, Kevin P
Wolfe, Kenneth H
A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title_full A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title_fullStr A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title_full_unstemmed A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title_short A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
title_sort pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3507789/
https://www.ncbi.nlm.nih.gov/pubmed/22984983
http://dx.doi.org/10.1186/1471-2105-13-237
work_keys_str_mv AT prouxweraestelle apipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT armisendavid apipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT byrnekevinp apipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT wolfekennethh apipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT prouxweraestelle pipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT armisendavid pipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT byrnekevinp pipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach
AT wolfekennethh pipelineforautomatedannotationofyeastgenomesequencesbyaconservedsyntenyapproach