Cargando…

Improved ontology for eukaryotic single-exon coding sequences in biological databases

Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as...

Descripción completa

Detalles Bibliográficos
Autores principales: Jorquera, Roddy, González, Carolina, Clausen, Philip, Petersen, Bent, Holmes, David S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146118/
https://www.ncbi.nlm.nih.gov/pubmed/30239665
http://dx.doi.org/10.1093/database/bay089
_version_ 1783356343878418432
author Jorquera, Roddy
González, Carolina
Clausen, Philip
Petersen, Bent
Holmes, David S
author_facet Jorquera, Roddy
González, Carolina
Clausen, Philip
Petersen, Bent
Holmes, David S
author_sort Jorquera, Roddy
collection PubMed
description Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.
format Online
Article
Text
id pubmed-6146118
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61461182018-09-25 Improved ontology for eukaryotic single-exon coding sequences in biological databases Jorquera, Roddy González, Carolina Clausen, Philip Petersen, Bent Holmes, David S Database (Oxford) Perspective/Opinion Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases. Oxford University Press 2018-09-18 /pmc/articles/PMC6146118/ /pubmed/30239665 http://dx.doi.org/10.1093/database/bay089 Text en © The Author(s) 2018. Published by Oxford University Press. https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
spellingShingle Perspective/Opinion
Jorquera, Roddy
González, Carolina
Clausen, Philip
Petersen, Bent
Holmes, David S
Improved ontology for eukaryotic single-exon coding sequences in biological databases
title Improved ontology for eukaryotic single-exon coding sequences in biological databases
title_full Improved ontology for eukaryotic single-exon coding sequences in biological databases
title_fullStr Improved ontology for eukaryotic single-exon coding sequences in biological databases
title_full_unstemmed Improved ontology for eukaryotic single-exon coding sequences in biological databases
title_short Improved ontology for eukaryotic single-exon coding sequences in biological databases
title_sort improved ontology for eukaryotic single-exon coding sequences in biological databases
topic Perspective/Opinion
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146118/
https://www.ncbi.nlm.nih.gov/pubmed/30239665
http://dx.doi.org/10.1093/database/bay089
work_keys_str_mv AT jorqueraroddy improvedontologyforeukaryoticsingleexoncodingsequencesinbiologicaldatabases
AT gonzalezcarolina improvedontologyforeukaryoticsingleexoncodingsequencesinbiologicaldatabases
AT clausenphilip improvedontologyforeukaryoticsingleexoncodingsequencesinbiologicaldatabases
AT petersenbent improvedontologyforeukaryoticsingleexoncodingsequencesinbiologicaldatabases
AT holmesdavids improvedontologyforeukaryoticsingleexoncodingsequencesinbiologicaldatabases