Cargando…

Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing

Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS...

Descripción completa

Detalles Bibliográficos
Autores principales: Orabi, Baraa, Xie, Ning, McConeghy, Brian, Dong, Xuesen, Chauve, Cedric, Hach, Faraz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881145/
https://www.ncbi.nlm.nih.gov/pubmed/36478271
http://dx.doi.org/10.1093/nar/gkac1112
_version_ 1784879050944151552
author Orabi, Baraa
Xie, Ning
McConeghy, Brian
Dong, Xuesen
Chauve, Cedric
Hach, Faraz
author_facet Orabi, Baraa
Xie, Ning
McConeghy, Brian
Dong, Xuesen
Chauve, Cedric
Hach, Faraz
author_sort Orabi, Baraa
collection PubMed
description Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem—the Minimum Error Clustering into Isoforms (MErCi) problem—and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/.
format Online
Article
Text
id pubmed-9881145
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98811452023-01-31 Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing Orabi, Baraa Xie, Ning McConeghy, Brian Dong, Xuesen Chauve, Cedric Hach, Faraz Nucleic Acids Res Methods Online Alternative splicing (AS) is an important mechanism in the development of many cancers, as novel or aberrant AS patterns play an important role as an independent onco-driver. In addition, cancer-specific AS is potentially an effective target of personalized cancer therapeutics. However, detecting AS events remains a challenging task, especially if these AS events are novel. This is exacerbated by the fact that existing transcriptome annotation databases are far from being comprehensive, especially with regard to cancer-specific AS. Additionally, traditional sequencing technologies are severely limited by the short length of the generated reads, which rarely spans more than a single splice junction site. Given these challenges, transcriptomic long-read (LR) sequencing presents a promising potential for the detection and discovery of AS. We present Freddie, a computational annotation-independent isoform discovery and detection tool. Freddie takes as input transcriptomic LR sequencing of a sample alongside its genomic split alignment and computes a set of isoforms for the given sample. It then partitions the input reads into sets that can be processed independently and in parallel. For each partition, Freddie segments the genomic alignment of the reads into canonical exon segments. The goal of this segmentation is to be able to represent any potential isoform as a subset of these canonical exons. This segmentation is formulated as an optimization problem and is solved with a dynamic programming algorithm. Then, Freddie reconstructs the isoforms by jointly clustering and error-correcting the reads using the canonical segmentation as a succinct representation. The clustering and error-correcting step is formulated as an optimization problem—the Minimum Error Clustering into Isoforms (MErCi) problem—and is solved using integer linear programming (ILP). We compare the performance of Freddie on simulated datasets with other isoform detection tools with varying dependence on annotation databases. We show that Freddie outperforms the other tools in its accuracy, including those given the complete ground truth annotation. We also run Freddie on a transcriptomic LR dataset generated in-house from a prostate cancer cell line with a matched short-read RNA-seq dataset. Freddie results in isoforms with a higher short-read cross-validation rate than the other tested tools. Freddie is open source and available at https://github.com/vpc-ccg/freddie/. Oxford University Press 2022-12-08 /pmc/articles/PMC9881145/ /pubmed/36478271 http://dx.doi.org/10.1093/nar/gkac1112 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Orabi, Baraa
Xie, Ning
McConeghy, Brian
Dong, Xuesen
Chauve, Cedric
Hach, Faraz
Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title_full Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title_fullStr Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title_full_unstemmed Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title_short Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
title_sort freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881145/
https://www.ncbi.nlm.nih.gov/pubmed/36478271
http://dx.doi.org/10.1093/nar/gkac1112
work_keys_str_mv AT orabibaraa freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing
AT xiening freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing
AT mcconeghybrian freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing
AT dongxuesen freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing
AT chauvecedric freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing
AT hachfaraz freddieannotationindependentdetectionanddiscoveryoftranscriptomicalternativesplicingisoformsusinglongreadsequencing