Cargando…

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction

We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Minoche, André E., Dohm, Juliane C., Schneider, Jessica, Holtgräwe, Daniela, Viehöver, Prisca, Montfort, Magda, Rosleff Sörensen, Thomas, Weisshaar, Bernd, Himmelbauer, Heinz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4556409/
https://www.ncbi.nlm.nih.gov/pubmed/26328666
http://dx.doi.org/10.1186/s13059-015-0729-7
Descripción
Sumario:We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0729-7) contains supplementary material, which is available to authorized users.