Cargando…

GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient

BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. Ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Techa-Angkoon, Prapaporn, Childs, Kevin L., Sun, Yanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929509/
https://www.ncbi.nlm.nih.gov/pubmed/31874598
http://dx.doi.org/10.1186/s12859-019-3047-3
_version_ 1783482717071998976
author Techa-Angkoon, Prapaporn
Childs, Kevin L.
Sun, Yanni
author_facet Techa-Angkoon, Prapaporn
Childs, Kevin L.
Sun, Yanni
author_sort Techa-Angkoon, Prapaporn
collection PubMed
description BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 (′)- 3(′) decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. RESULTS: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. CONCLUSIONS: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.
format Online
Article
Text
id pubmed-6929509
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69295092019-12-30 GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient Techa-Angkoon, Prapaporn Childs, Kevin L. Sun, Yanni BMC Bioinformatics Research BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 (′)- 3(′) decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. RESULTS: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. CONCLUSIONS: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/. BioMed Central 2019-12-24 /pmc/articles/PMC6929509/ /pubmed/31874598 http://dx.doi.org/10.1186/s12859-019-3047-3 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Techa-Angkoon, Prapaporn
Childs, Kevin L.
Sun, Yanni
GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title_full GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title_fullStr GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title_full_unstemmed GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title_short GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
title_sort gpred-gc: a gene prediction model accounting for 5 (′)- 3(′) gc gradient
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929509/
https://www.ncbi.nlm.nih.gov/pubmed/31874598
http://dx.doi.org/10.1186/s12859-019-3047-3
work_keys_str_mv AT techaangkoonprapaporn gpredgcagenepredictionmodelaccountingfor53gcgradient
AT childskevinl gpredgcagenepredictionmodelaccountingfor53gcgradient
AT sunyanni gpredgcagenepredictionmodelaccountingfor53gcgradient