Cargando…

Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level

The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, h...

Descripción completa

Detalles Bibliográficos
Autores principales: Savytska, Natalia, Heutink, Peter, Bansal, Vikas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633680/
https://www.ncbi.nlm.nih.gov/pubmed/36338986
http://dx.doi.org/10.3389/fgene.2022.1026847
_version_ 1784824290595569664
author Savytska, Natalia
Heutink, Peter
Bansal, Vikas
author_facet Savytska, Natalia
Heutink, Peter
Bansal, Vikas
author_sort Savytska, Natalia
collection PubMed
description The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data.
format Online
Article
Text
id pubmed-9633680
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96336802022-11-05 Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level Savytska, Natalia Heutink, Peter Bansal, Vikas Front Genet Genetics The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data. Frontiers Media S.A. 2022-10-21 /pmc/articles/PMC9633680/ /pubmed/36338986 http://dx.doi.org/10.3389/fgene.2022.1026847 Text en Copyright © 2022 Savytska, Heutink and Bansal. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Savytska, Natalia
Heutink, Peter
Bansal, Vikas
Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title_full Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title_fullStr Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title_full_unstemmed Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title_short Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
title_sort transcription start site signal profiling improves transposable element rna expression analysis at locus-level
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633680/
https://www.ncbi.nlm.nih.gov/pubmed/36338986
http://dx.doi.org/10.3389/fgene.2022.1026847
work_keys_str_mv AT savytskanatalia transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel
AT heutinkpeter transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel
AT bansalvikas transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel