Cargando…

Challenges in identifying mRNA transcript starts and ends from long-read sequencing data

Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Calvo-Roitberg, Ezequiel, Daniels, Rachel F., Pai, Athma A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10402045/ https://www.ncbi.nlm.nih.gov/pubmed/37546743 http://dx.doi.org/10.1101/2023.07.26.550536

_version_	1785084793088638976
author	Calvo-Roitberg, Ezequiel Daniels, Rachel F. Pai, Athma A.
author_facet	Calvo-Roitberg, Ezequiel Daniels, Rachel F. Pai, Athma A.
author_sort	Calvo-Roitberg, Ezequiel
collection	PubMed
description	Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing.
format	Online Article Text
id	pubmed-10402045
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-104020452023-08-05 Challenges in identifying mRNA transcript starts and ends from long-read sequencing data Calvo-Roitberg, Ezequiel Daniels, Rachel F. Pai, Athma A. bioRxiv Article Long-read sequencing (LRS) technologies have the potential to revolutionize scientific discoveries in RNA biology, especially by enabling the comprehensive identification and quantification of full length mRNA isoforms. However, inherently high error rates make the analysis of long-read sequencing data challenging. While these error rates have been characterized for sequence and splice site identification, it is still unclear how accurately LRS reads represent transcript start and end sites. Here, we systematically assess the variability and accuracy of mRNA terminal ends identified by LRS reads across multiple sequencing platforms. We find substantial inconsistencies in both the start and end coordinates of LRS reads spanning a gene, such that LRS reads often fail to accurately recapitulate annotated or empirically derived terminal ends of mRNA molecules. To address this challenge, we introduce an approach to condition reads based on empirically derived terminal ends and identified a subset of reads that are more likely to represent full-length transcripts. Our approach can improve transcriptome analyses by enhancing the fidelity of transcript terminal end identification, but may result in lower power to quantify genes or discover novel isoforms. Thus, it is necessary to be cautious when selecting sequencing approaches and/or interpreting data from long-read RNA sequencing. Cold Spring Harbor Laboratory 2023-07-27 /pmc/articles/PMC10402045/ /pubmed/37546743 http://dx.doi.org/10.1101/2023.07.26.550536 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle	Article Calvo-Roitberg, Ezequiel Daniels, Rachel F. Pai, Athma A. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title	Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title_full	Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title_fullStr	Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title_full_unstemmed	Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title_short	Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
title_sort	challenges in identifying mrna transcript starts and ends from long-read sequencing data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10402045/ https://www.ncbi.nlm.nih.gov/pubmed/37546743 http://dx.doi.org/10.1101/2023.07.26.550536
work_keys_str_mv	AT calvoroitbergezequiel challengesinidentifyingmrnatranscriptstartsandendsfromlongreadsequencingdata AT danielsrachelf challengesinidentifyingmrnatranscriptstartsandendsfromlongreadsequencingdata AT paiathmaa challengesinidentifyingmrnatranscriptstartsandendsfromlongreadsequencingdata

Challenges in identifying mRNA transcript starts and ends from long-read sequencing data

Ejemplares similares