Cargando…

Understanding small ORF diversity through a comprehensive transcription feature classification

Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper invest...

Descripción completa

Detalles Bibliográficos
Autores principales: Guerra-Almeida, Diego, Tschoeke, Diogo Antonio, Nunes-da-Fonseca, Rodrigo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8435553/
https://www.ncbi.nlm.nih.gov/pubmed/34240112
http://dx.doi.org/10.1093/dnares/dsab007
_version_ 1783751816823963648
author Guerra-Almeida, Diego
Tschoeke, Diogo Antonio
Nunes-da-Fonseca, Rodrigo
author_facet Guerra-Almeida, Diego
Tschoeke, Diogo Antonio
Nunes-da-Fonseca, Rodrigo
author_sort Guerra-Almeida, Diego
collection PubMed
description Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world.
format Online
Article
Text
id pubmed-8435553
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84355532021-09-14 Understanding small ORF diversity through a comprehensive transcription feature classification Guerra-Almeida, Diego Tschoeke, Diogo Antonio Nunes-da-Fonseca, Rodrigo DNA Res Invited Review Small open reading frames (small ORFs/sORFs/smORFs) are potentially coding sequences smaller than 100 codons that have historically been considered junk DNA by gene prediction software and in annotation screening; however, the advent of next-generation sequencing has contributed to the deeper investigation of junk DNA regions and their transcription products, resulting in the emergence of smORFs as a new focus of interest in systems biology. Several smORF peptides were recently reported in non-canonical mRNAs as new players in numerous biological contexts; however, their relevance is still overlooked in coding potential analysis. Hence, this review proposes a smORF classification based on transcriptional features, discussing the most promising approaches to investigate smORFs based on their different characteristics. First, smORFs were divided into non-expressed (intergenic) and expressed (genic) smORFs. Second, genic smORFs were classified as smORFs located in non-coding RNAs (ncRNAs) or canonical mRNAs. Finally, smORFs in ncRNAs were further subdivided into sequences located in small or long RNAs, whereas smORFs located in canonical mRNAs were subdivided into several specific classes depending on their localization along the gene. We hope that this review provides new insights into large-scale annotations and reinforces the role of smORFs as essential components of a hidden coding DNA world. Oxford University Press 2021-07-07 /pmc/articles/PMC8435553/ /pubmed/34240112 http://dx.doi.org/10.1093/dnares/dsab007 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Invited Review
Guerra-Almeida, Diego
Tschoeke, Diogo Antonio
Nunes-da-Fonseca, Rodrigo
Understanding small ORF diversity through a comprehensive transcription feature classification
title Understanding small ORF diversity through a comprehensive transcription feature classification
title_full Understanding small ORF diversity through a comprehensive transcription feature classification
title_fullStr Understanding small ORF diversity through a comprehensive transcription feature classification
title_full_unstemmed Understanding small ORF diversity through a comprehensive transcription feature classification
title_short Understanding small ORF diversity through a comprehensive transcription feature classification
title_sort understanding small orf diversity through a comprehensive transcription feature classification
topic Invited Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8435553/
https://www.ncbi.nlm.nih.gov/pubmed/34240112
http://dx.doi.org/10.1093/dnares/dsab007
work_keys_str_mv AT guerraalmeidadiego understandingsmallorfdiversitythroughacomprehensivetranscriptionfeatureclassification
AT tschoekediogoantonio understandingsmallorfdiversitythroughacomprehensivetranscriptionfeatureclassification
AT nunesdafonsecarodrigo understandingsmallorfdiversitythroughacomprehensivetranscriptionfeatureclassification