Cargando…

Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data

Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatic...

Descripción completa

Detalles Bibliográficos
Autores principales: Bushel, Pierre R., Ferguson, Stephen S., Ramaiahgari, Sreenivasa C., Paules, Richard S., Auerbach, Scott S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325690/
https://www.ncbi.nlm.nih.gov/pubmed/32655620
http://dx.doi.org/10.3389/fgene.2020.00594
_version_ 1783552193966637056
author Bushel, Pierre R.
Ferguson, Stephen S.
Ramaiahgari, Sreenivasa C.
Paules, Richard S.
Auerbach, Scott S.
author_facet Bushel, Pierre R.
Ferguson, Stephen S.
Ramaiahgari, Sreenivasa C.
Paules, Richard S.
Auerbach, Scott S.
author_sort Bushel, Pierre R.
collection PubMed
description Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
format Online
Article
Text
id pubmed-7325690
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-73256902020-07-09 Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data Bushel, Pierre R. Ferguson, Stephen S. Ramaiahgari, Sreenivasa C. Paules, Richard S. Auerbach, Scott S. Front Genet Genetics Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results. Frontiers Media S.A. 2020-06-23 /pmc/articles/PMC7325690/ /pubmed/32655620 http://dx.doi.org/10.3389/fgene.2020.00594 Text en Copyright © 2020 Bushel, Ferguson, Ramaiahgari, Paules and Auerbach. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Bushel, Pierre R.
Ferguson, Stephen S.
Ramaiahgari, Sreenivasa C.
Paules, Richard S.
Auerbach, Scott S.
Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title_full Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title_fullStr Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title_full_unstemmed Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title_short Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data
title_sort comparison of normalization methods for analysis of tempo-seq targeted rna sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7325690/
https://www.ncbi.nlm.nih.gov/pubmed/32655620
http://dx.doi.org/10.3389/fgene.2020.00594
work_keys_str_mv AT bushelpierrer comparisonofnormalizationmethodsforanalysisoftemposeqtargetedrnasequencingdata
AT fergusonstephens comparisonofnormalizationmethodsforanalysisoftemposeqtargetedrnasequencingdata
AT ramaiahgarisreenivasac comparisonofnormalizationmethodsforanalysisoftemposeqtargetedrnasequencingdata
AT paulesrichards comparisonofnormalizationmethodsforanalysisoftemposeqtargetedrnasequencingdata
AT auerbachscotts comparisonofnormalizationmethodsforanalysisoftemposeqtargetedrnasequencingdata