Cargando…

Assisted transcriptome reconstruction and splicing orthology

BACKGROUND: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algo...

Descripción completa

Detalles Bibliográficos
Autores principales: Blanquart, Samuel, Varré, Jean-Stéphane, Guertin, Paul, Perrin, Amandine, Bergeron, Anne, Swenson, Krister M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123294/
https://www.ncbi.nlm.nih.gov/pubmed/28185551
http://dx.doi.org/10.1186/s12864-016-3103-6
_version_ 1782469704849293312
author Blanquart, Samuel
Varré, Jean-Stéphane
Guertin, Paul
Perrin, Amandine
Bergeron, Anne
Swenson, Krister M.
author_facet Blanquart, Samuel
Varré, Jean-Stéphane
Guertin, Paul
Perrin, Amandine
Bergeron, Anne
Swenson, Krister M.
author_sort Blanquart, Samuel
collection PubMed
description BACKGROUND: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs. RESULTS: We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. CONCLUSIONS: Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3103-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5123294
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51232942016-12-06 Assisted transcriptome reconstruction and splicing orthology Blanquart, Samuel Varré, Jean-Stéphane Guertin, Paul Perrin, Amandine Bergeron, Anne Swenson, Krister M. BMC Genomics Research BACKGROUND: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve — as has been strongly suggested in the literature — it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs. RESULTS: We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. CONCLUSIONS: Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3103-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-11 /pmc/articles/PMC5123294/ /pubmed/28185551 http://dx.doi.org/10.1186/s12864-016-3103-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Blanquart, Samuel
Varré, Jean-Stéphane
Guertin, Paul
Perrin, Amandine
Bergeron, Anne
Swenson, Krister M.
Assisted transcriptome reconstruction and splicing orthology
title Assisted transcriptome reconstruction and splicing orthology
title_full Assisted transcriptome reconstruction and splicing orthology
title_fullStr Assisted transcriptome reconstruction and splicing orthology
title_full_unstemmed Assisted transcriptome reconstruction and splicing orthology
title_short Assisted transcriptome reconstruction and splicing orthology
title_sort assisted transcriptome reconstruction and splicing orthology
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123294/
https://www.ncbi.nlm.nih.gov/pubmed/28185551
http://dx.doi.org/10.1186/s12864-016-3103-6
work_keys_str_mv AT blanquartsamuel assistedtranscriptomereconstructionandsplicingorthology
AT varrejeanstephane assistedtranscriptomereconstructionandsplicingorthology
AT guertinpaul assistedtranscriptomereconstructionandsplicingorthology
AT perrinamandine assistedtranscriptomereconstructionandsplicingorthology
AT bergeronanne assistedtranscriptomereconstructionandsplicingorthology
AT swensonkristerm assistedtranscriptomereconstructionandsplicingorthology