Cargando…

Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog

BACKGROUND: In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. Ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Guillaudeux, Nicolas, Belleannée, Catherine, Blanquart, Samuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8933948/
https://www.ncbi.nlm.nih.gov/pubmed/35303798
http://dx.doi.org/10.1186/s12864-022-08429-4
_version_ 1784671767750508544
author Guillaudeux, Nicolas
Belleannée, Catherine
Blanquart, Samuel
author_facet Guillaudeux, Nicolas
Belleannée, Catherine
Blanquart, Samuel
author_sort Guillaudeux, Nicolas
collection PubMed
description BACKGROUND: In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. RESULTS: We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. CONCLUSIONS: We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08429-4).
format Online
Article
Text
id pubmed-8933948
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89339482022-03-23 Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog Guillaudeux, Nicolas Belleannée, Catherine Blanquart, Samuel BMC Genomics Research BACKGROUND: In eukaryote transcriptomes, a significant amount of transcript diversity comes from genes’ capacity to generate different transcripts through alternative splicing. Identifying orthologous alternative transcripts across multiple species is of particular interest for genome annotators. However, there is no formal definition of transcript orthology based on the splicing structure conservation. Likewise there is no public dataset benchmark providing groups of orthologous transcripts sharing a conserved splicing structure. RESULTS: We introduced a formal definition of splicing structure orthology and we predicted transcript orthologs in human, mouse and dog. Applying a selective strategy, we analyzed 2,167 genes and their 18,109 known transcripts and identified a set of 253 gene orthologs that shared a conserved splicing structure in all three species. We predicted 6,861 transcript CDSs (coding sequence), mainly for dog, an emergent model species. Each predicted transcript was an ortholog of a known transcript: both share the same CDS splicing structure. Evidence for the existence of the predicted CDSs was found in external data. CONCLUSIONS: We generated a dataset of 253 gene triplets, structurally conserved and sharing all their CDSs in human, mouse and dog, which correspond to 879 triplets of spliced CDS orthologs. We have released the dataset both as an SQL database and as tabulated files. The data consists of the 879 CDS orthology groups with their detailed splicing structures, and the predicted CDSs, associated with their experimental evidence. The 6,861 predicted CDSs are provided in GTF files. Our data may contribute to compare highly conserved genes across three species, for comparative transcriptomics at the isoform level, or for benchmarking splice aligners and methods focusing on the identification of splicing orthologs. The data is available at https://data-access.cesgo.org/index.php/s/V97GXxOS66NqTkZ. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-022-08429-4). BioMed Central 2022-03-18 /pmc/articles/PMC8933948/ /pubmed/35303798 http://dx.doi.org/10.1186/s12864-022-08429-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Guillaudeux, Nicolas
Belleannée, Catherine
Blanquart, Samuel
Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_full Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_fullStr Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_full_unstemmed Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_short Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
title_sort identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8933948/
https://www.ncbi.nlm.nih.gov/pubmed/35303798
http://dx.doi.org/10.1186/s12864-022-08429-4
work_keys_str_mv AT guillaudeuxnicolas identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog
AT belleanneecatherine identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog
AT blanquartsamuel identifyinggeneswithconservedsplicingstructureandorthologousisoformsinhumanmouseanddog