Cargando…
Computational discovery and annotation of conserved small open reading frames in fungal genomes
BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can compl...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394265/ https://www.ncbi.nlm.nih.gov/pubmed/30717662 http://dx.doi.org/10.1186/s12859-018-2550-2 |
_version_ | 1783565201665163264 |
---|---|
author | Mat-Sharani, Shuhaila Firdaus-Raih, Mohd |
author_facet | Mat-Sharani, Shuhaila Firdaus-Raih, Mohd |
author_sort | Mat-Sharani, Shuhaila |
collection | PubMed |
description | BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2550-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-7394265 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73942652020-08-05 Computational discovery and annotation of conserved small open reading frames in fungal genomes Mat-Sharani, Shuhaila Firdaus-Raih, Mohd BMC Bioinformatics Research BACKGROUND: Small open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes. RESULTS: A first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized. CONCLUSIONS: It is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2550-2) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7394265/ /pubmed/30717662 http://dx.doi.org/10.1186/s12859-018-2550-2 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Mat-Sharani, Shuhaila Firdaus-Raih, Mohd Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title | Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title_full | Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title_fullStr | Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title_full_unstemmed | Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title_short | Computational discovery and annotation of conserved small open reading frames in fungal genomes |
title_sort | computational discovery and annotation of conserved small open reading frames in fungal genomes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394265/ https://www.ncbi.nlm.nih.gov/pubmed/30717662 http://dx.doi.org/10.1186/s12859-018-2550-2 |
work_keys_str_mv | AT matsharanishuhaila computationaldiscoveryandannotationofconservedsmallopenreadingframesinfungalgenomes AT firdausraihmohd computationaldiscoveryandannotationofconservedsmallopenreadingframesinfungalgenomes |