Cargando…

Computational discovery and annotation of conserved small open reading frames in fungal genomes

BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can compl...

Descripción completa

Detalles Bibliográficos
Autores principales: Mat-Sharani, Shuhaila, Firdaus-Raih, Mohd
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394265/
https://www.ncbi.nlm.nih.gov/pubmed/30717662
http://dx.doi.org/10.1186/s12859-018-2550-2
_version_ 1783565201665163264
author Mat-Sharani, Shuhaila
Firdaus-Raih, Mohd
author_facet Mat-Sharani, Shuhaila
Firdaus-Raih, Mohd
author_sort Mat-Sharani, Shuhaila
collection PubMed
description BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2550-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7394265
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73942652020-08-05 Computational discovery and annotation of conserved small open reading frames in fungal genomes Mat-Sharani, Shuhaila Firdaus-Raih, Mohd BMC Bioinformatics Research BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2550-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7394265/ /pubmed/30717662 http://dx.doi.org/10.1186/s12859-018-2550-2 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Mat-Sharani, Shuhaila
Firdaus-Raih, Mohd
Computational discovery and annotation of conserved small open reading frames in fungal genomes
title Computational discovery and annotation of conserved small open reading frames in fungal genomes
title_full Computational discovery and annotation of conserved small open reading frames in fungal genomes
title_fullStr Computational discovery and annotation of conserved small open reading frames in fungal genomes
title_full_unstemmed Computational discovery and annotation of conserved small open reading frames in fungal genomes
title_short Computational discovery and annotation of conserved small open reading frames in fungal genomes
title_sort computational discovery and annotation of conserved small open reading frames in fungal genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394265/
https://www.ncbi.nlm.nih.gov/pubmed/30717662
http://dx.doi.org/10.1186/s12859-018-2550-2
work_keys_str_mv AT matsharanishuhaila computationaldiscoveryandannotationofconservedsmallopenreadingframesinfungalgenomes
AT firdausraihmohd computationaldiscoveryandannotationofconservedsmallopenreadingframesinfungalgenomes