Cargando…

ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data

Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discover...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Yuan, Wang, Feng, Wang, Robert, Kutschera, Eric, Xu, Yang, Xie, Stephan, Wang, Yuanyuan, Kadash-Edmondson, Kathryn E., Lin, Lan, Xing, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Association for the Advancement of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858503/
https://www.ncbi.nlm.nih.gov/pubmed/36662851
http://dx.doi.org/10.1126/sciadv.abq5072
Descripción
Sumario:Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.