Cargando…
GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient
BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. Ho...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929509/ https://www.ncbi.nlm.nih.gov/pubmed/31874598 http://dx.doi.org/10.1186/s12859-019-3047-3 |
_version_ | 1783482717071998976 |
---|---|
author | Techa-Angkoon, Prapaporn Childs, Kevin L. Sun, Yanni |
author_facet | Techa-Angkoon, Prapaporn Childs, Kevin L. Sun, Yanni |
author_sort | Techa-Angkoon, Prapaporn |
collection | PubMed |
description | BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 (′)- 3(′) decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. RESULTS: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. CONCLUSIONS: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/. |
format | Online Article Text |
id | pubmed-6929509 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69295092019-12-30 GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient Techa-Angkoon, Prapaporn Childs, Kevin L. Sun, Yanni BMC Bioinformatics Research BACKGROUND: Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 (′)- 3(′) decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. RESULTS: In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. CONCLUSIONS: GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/. BioMed Central 2019-12-24 /pmc/articles/PMC6929509/ /pubmed/31874598 http://dx.doi.org/10.1186/s12859-019-3047-3 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Techa-Angkoon, Prapaporn Childs, Kevin L. Sun, Yanni GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title | GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title_full | GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title_fullStr | GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title_full_unstemmed | GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title_short | GPRED-GC: a Gene PREDiction model accounting for 5 (′)- 3(′) GC gradient |
title_sort | gpred-gc: a gene prediction model accounting for 5 (′)- 3(′) gc gradient |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929509/ https://www.ncbi.nlm.nih.gov/pubmed/31874598 http://dx.doi.org/10.1186/s12859-019-3047-3 |
work_keys_str_mv | AT techaangkoonprapaporn gpredgcagenepredictionmodelaccountingfor53gcgradient AT childskevinl gpredgcagenepredictionmodelaccountingfor53gcgradient AT sunyanni gpredgcagenepredictionmodelaccountingfor53gcgradient |