Cargando…
Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level
The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, h...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633680/ https://www.ncbi.nlm.nih.gov/pubmed/36338986 http://dx.doi.org/10.3389/fgene.2022.1026847 |
_version_ | 1784824290595569664 |
---|---|
author | Savytska, Natalia Heutink, Peter Bansal, Vikas |
author_facet | Savytska, Natalia Heutink, Peter Bansal, Vikas |
author_sort | Savytska, Natalia |
collection | PubMed |
description | The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data. |
format | Online Article Text |
id | pubmed-9633680 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96336802022-11-05 Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level Savytska, Natalia Heutink, Peter Bansal, Vikas Front Genet Genetics The transcriptional activity of Transposable Elements (TEs) has been involved in numerous pathological processes, including neurodegenerative diseases such as amyotrophic lateral sclerosis and frontotemporal lobar degeneration. The TE expression analysis from short-read sequencing technologies is, however, challenging due to the multitude of similar sequences derived from singular TEs subfamilies and the exaptation of TEs within longer coding or non-coding RNAs. Specialised tools have been developed to quantify the expression of TEs that either relies on probabilistic re-distribution of multimapper count fractions or allow for discarding multimappers altogether. Until now, the benchmarking across those tools was largely limited to aggregated expression estimates over whole TEs subfamilies. Here, we compared the performance of recently published tools (SQuIRE, TElocal, SalmonTE) with simplistic quantification strategies (featureCounts in unique, fraction and random modes) at the individual loci level. Using simulated datasets, we examined the false discovery rate and the primary driver of those false positive hits in the optimal quantification strategy. Our findings suggest a high false discovery number that exceeds the total number of correctly recovered active loci for all the quantification strategies, including the best performing tool TElocal. As a remedy, filtering based on the minimum number of read counts or baseMean expression improves the F1 score and decreases the number of false positives. Finally, we demonstrate that additional profiling of Transcription Start Site mapping statistics (using a k-means clustering approach) significantly improves the performance of TElocal while reporting a reliable set of detected and differentially expressed TEs in human simulated RNA-seq data. Frontiers Media S.A. 2022-10-21 /pmc/articles/PMC9633680/ /pubmed/36338986 http://dx.doi.org/10.3389/fgene.2022.1026847 Text en Copyright © 2022 Savytska, Heutink and Bansal. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Savytska, Natalia Heutink, Peter Bansal, Vikas Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title | Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title_full | Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title_fullStr | Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title_full_unstemmed | Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title_short | Transcription start site signal profiling improves transposable element RNA expression analysis at locus-level |
title_sort | transcription start site signal profiling improves transposable element rna expression analysis at locus-level |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9633680/ https://www.ncbi.nlm.nih.gov/pubmed/36338986 http://dx.doi.org/10.3389/fgene.2022.1026847 |
work_keys_str_mv | AT savytskanatalia transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel AT heutinkpeter transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel AT bansalvikas transcriptionstartsitesignalprofilingimprovestransposableelementrnaexpressionanalysisatlocuslevel |