Cargando…

Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data

Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accur...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhen, Zhu, Chenchen, Steinmetz, Lars M, Wei, Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10639058/
https://www.ncbi.nlm.nih.gov/pubmed/37843096
http://dx.doi.org/10.1093/nar/gkad810
_version_ 1785146615720312832
author Liu, Zhen
Zhu, Chenchen
Steinmetz, Lars M
Wei, Wu
author_facet Liu, Zhen
Zhu, Chenchen
Steinmetz, Lars M
Wei, Wu
author_sort Liu, Zhen
collection PubMed
description Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.
format Online
Article
Text
id pubmed-10639058
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106390582023-11-15 Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data Liu, Zhen Zhu, Chenchen Steinmetz, Lars M Wei, Wu Nucleic Acids Res Methods Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons. Oxford University Press 2023-10-16 /pmc/articles/PMC10639058/ /pubmed/37843096 http://dx.doi.org/10.1093/nar/gkad810 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
Liu, Zhen
Zhu, Chenchen
Steinmetz, Lars M
Wei, Wu
Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title_full Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title_fullStr Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title_full_unstemmed Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title_short Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data
title_sort identification and quantification of small exon-containing isoforms in long-read rna sequencing data
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10639058/
https://www.ncbi.nlm.nih.gov/pubmed/37843096
http://dx.doi.org/10.1093/nar/gkad810
work_keys_str_mv AT liuzhen identificationandquantificationofsmallexoncontainingisoformsinlongreadrnasequencingdata
AT zhuchenchen identificationandquantificationofsmallexoncontainingisoformsinlongreadrnasequencingdata
AT steinmetzlarsm identificationandquantificationofsmallexoncontainingisoformsinlongreadrnasequencingdata
AT weiwu identificationandquantificationofsmallexoncontainingisoformsinlongreadrnasequencingdata