Cargando…

SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

BACKGROUND: Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencin...

Descripción completa

Detalles Bibliográficos
Autores principales: Reid, Ian, O’Toole, Nicholas, Zabaneh, Omar, Nourzadeh, Reza, Dahdouli, Mahmoud, Abdellateef, Mostafa, Gordon, Paul MK, Soh, Jung, Butler, Gregory, Sensen, Christoph W, Tsang, Adrian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084796/
https://www.ncbi.nlm.nih.gov/pubmed/24980894
http://dx.doi.org/10.1186/1471-2105-15-229
_version_ 1782324567840129024
author Reid, Ian
O’Toole, Nicholas
Zabaneh, Omar
Nourzadeh, Reza
Dahdouli, Mahmoud
Abdellateef, Mostafa
Gordon, Paul MK
Soh, Jung
Butler, Gregory
Sensen, Christoph W
Tsang, Adrian
author_facet Reid, Ian
O’Toole, Nicholas
Zabaneh, Omar
Nourzadeh, Reza
Dahdouli, Mahmoud
Abdellateef, Mostafa
Gordon, Paul MK
Soh, Jung
Butler, Gregory
Sensen, Christoph W
Tsang, Adrian
author_sort Reid, Ian
collection PubMed
description BACKGROUND: Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. RESULTS: SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. CONCLUSIONS: SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/.
format Online
Article
Text
id pubmed-4084796
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40847962014-07-18 SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models Reid, Ian O’Toole, Nicholas Zabaneh, Omar Nourzadeh, Reza Dahdouli, Mahmoud Abdellateef, Mostafa Gordon, Paul MK Soh, Jung Butler, Gregory Sensen, Christoph W Tsang, Adrian BMC Bioinformatics Software BACKGROUND: Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. RESULTS: SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. CONCLUSIONS: SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. BioMed Central 2014-07-01 /pmc/articles/PMC4084796/ /pubmed/24980894 http://dx.doi.org/10.1186/1471-2105-15-229 Text en Copyright © 2014 Reid et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Software
Reid, Ian
O’Toole, Nicholas
Zabaneh, Omar
Nourzadeh, Reza
Dahdouli, Mahmoud
Abdellateef, Mostafa
Gordon, Paul MK
Soh, Jung
Butler, Gregory
Sensen, Christoph W
Tsang, Adrian
SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title_full SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title_fullStr SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title_full_unstemmed SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title_short SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models
title_sort snowyowl: accurate prediction of fungal genes by using rna-seq and homology information to select among ab initio models
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4084796/
https://www.ncbi.nlm.nih.gov/pubmed/24980894
http://dx.doi.org/10.1186/1471-2105-15-229
work_keys_str_mv AT reidian snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT otoolenicholas snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT zabanehomar snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT nourzadehreza snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT dahdoulimahmoud snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT abdellateefmostafa snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT gordonpaulmk snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT sohjung snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT butlergregory snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT sensenchristophw snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels
AT tsangadrian snowyowlaccuratepredictionoffungalgenesbyusingrnaseqandhomologyinformationtoselectamongabinitiomodels