Cargando…

Evaluation of Different Gene Prediction Tools in Coccidioides immitis

Gene prediction is required to obtain optimal biologically meaningful information from genomic sequences, but automated gene prediction software is imperfect. In this study, we compare the original annotation of the Coccidioides immitis RS genome (the reference strain of C. immitis) to annotations u...

Descripción completa

Detalles Bibliográficos
Autores principales: Kirkland, Theo N., Beyhan, Sinem, Stajich, Jason E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672684/
https://www.ncbi.nlm.nih.gov/pubmed/37998899
http://dx.doi.org/10.3390/jof9111094
_version_ 1785140448884424704
author Kirkland, Theo N.
Beyhan, Sinem
Stajich, Jason E.
author_facet Kirkland, Theo N.
Beyhan, Sinem
Stajich, Jason E.
author_sort Kirkland, Theo N.
collection PubMed
description Gene prediction is required to obtain optimal biologically meaningful information from genomic sequences, but automated gene prediction software is imperfect. In this study, we compare the original annotation of the Coccidioides immitis RS genome (the reference strain of C. immitis) to annotations using the Funannotate and Augustus genome prediction pipelines. A total of 25% of the originally predicted genes (denoted CIMG) were not found in either the Funannotate or Augustus predictions. A comparison of Funannotate and Augustus predictions also found overlapping but not identical sets of genes. The predicted genes found only in the original annotation (referred to as CIMG-unique) were less likely to have a meaningful functional annotation and a lower number of orthologs and homologs in other fungi than all CIMG genes predicted by the original annotation. The CIMG-unique genes were also more likely to be lineage-specific and poorly expressed. In addition, the CIMG-unique genes were found in clusters and tended to be more frequently associated with transposable elements than all CIMG-predicted genes. The CIMG-unique genes were more likely to have experimentally determined transcription start sites that were further away from the originally predicted transcription start sites, and experimentally determined initial transcription was less likely to result in stable CIMG-unique transcripts. A sample of CIMG-unique genes that were relatively well expressed and differentially expressed in mycelia and spherules was inspected in a genome browser, and the structure of only about half of them was found to be supported by RNA-seq data. These data suggest that some of the CIMG-unique genes are not authentic gene predictions. Genes that were predicted only by the Funannotate pipeline were also less likely to have a meaningful functional annotation, be shorter, and express less well than all the genes predicted by Funannotate. C. immitis genes predicted by more than one annotation are more likely to have predicted functions, many orthologs and homologs, and be well expressed. Lineage-specific genes are relatively uncommon in this group. These data emphasize the importance and limitations of gene prediction software and suggest that improvements to the annotation of the C. immitis genome should be considered.
format Online
Article
Text
id pubmed-10672684
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106726842023-11-09 Evaluation of Different Gene Prediction Tools in Coccidioides immitis Kirkland, Theo N. Beyhan, Sinem Stajich, Jason E. J Fungi (Basel) Article Gene prediction is required to obtain optimal biologically meaningful information from genomic sequences, but automated gene prediction software is imperfect. In this study, we compare the original annotation of the Coccidioides immitis RS genome (the reference strain of C. immitis) to annotations using the Funannotate and Augustus genome prediction pipelines. A total of 25% of the originally predicted genes (denoted CIMG) were not found in either the Funannotate or Augustus predictions. A comparison of Funannotate and Augustus predictions also found overlapping but not identical sets of genes. The predicted genes found only in the original annotation (referred to as CIMG-unique) were less likely to have a meaningful functional annotation and a lower number of orthologs and homologs in other fungi than all CIMG genes predicted by the original annotation. The CIMG-unique genes were also more likely to be lineage-specific and poorly expressed. In addition, the CIMG-unique genes were found in clusters and tended to be more frequently associated with transposable elements than all CIMG-predicted genes. The CIMG-unique genes were more likely to have experimentally determined transcription start sites that were further away from the originally predicted transcription start sites, and experimentally determined initial transcription was less likely to result in stable CIMG-unique transcripts. A sample of CIMG-unique genes that were relatively well expressed and differentially expressed in mycelia and spherules was inspected in a genome browser, and the structure of only about half of them was found to be supported by RNA-seq data. These data suggest that some of the CIMG-unique genes are not authentic gene predictions. Genes that were predicted only by the Funannotate pipeline were also less likely to have a meaningful functional annotation, be shorter, and express less well than all the genes predicted by Funannotate. C. immitis genes predicted by more than one annotation are more likely to have predicted functions, many orthologs and homologs, and be well expressed. Lineage-specific genes are relatively uncommon in this group. These data emphasize the importance and limitations of gene prediction software and suggest that improvements to the annotation of the C. immitis genome should be considered. MDPI 2023-11-09 /pmc/articles/PMC10672684/ /pubmed/37998899 http://dx.doi.org/10.3390/jof9111094 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kirkland, Theo N.
Beyhan, Sinem
Stajich, Jason E.
Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title_full Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title_fullStr Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title_full_unstemmed Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title_short Evaluation of Different Gene Prediction Tools in Coccidioides immitis
title_sort evaluation of different gene prediction tools in coccidioides immitis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672684/
https://www.ncbi.nlm.nih.gov/pubmed/37998899
http://dx.doi.org/10.3390/jof9111094
work_keys_str_mv AT kirklandtheon evaluationofdifferentgenepredictiontoolsincoccidioidesimmitis
AT beyhansinem evaluationofdifferentgenepredictiontoolsincoccidioidesimmitis
AT stajichjasone evaluationofdifferentgenepredictiontoolsincoccidioidesimmitis