Cargando…

Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing

Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to...

Descripción completa

Detalles Bibliográficos
Autores principales: Abbas-Aghababazadeh, Farnoosh, Li, Qian, Fridley, Brooke L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209231/
https://www.ncbi.nlm.nih.gov/pubmed/30379879
http://dx.doi.org/10.1371/journal.pone.0206312
_version_ 1783366869319680000
author Abbas-Aghababazadeh, Farnoosh
Li, Qian
Fridley, Brooke L.
author_facet Abbas-Aghababazadeh, Farnoosh
Li, Qian
Fridley, Brooke L.
author_sort Abbas-Aghababazadeh, Farnoosh
collection PubMed
description Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.
format Online
Article
Text
id pubmed-6209231
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62092312018-11-19 Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing Abbas-Aghababazadeh, Farnoosh Li, Qian Fridley, Brooke L. PLoS One Research Article Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data. Public Library of Science 2018-10-31 /pmc/articles/PMC6209231/ /pubmed/30379879 http://dx.doi.org/10.1371/journal.pone.0206312 Text en © 2018 Abbas-Aghababazadeh et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Abbas-Aghababazadeh, Farnoosh
Li, Qian
Fridley, Brooke L.
Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title_full Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title_fullStr Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title_full_unstemmed Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title_short Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
title_sort comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209231/
https://www.ncbi.nlm.nih.gov/pubmed/30379879
http://dx.doi.org/10.1371/journal.pone.0206312
work_keys_str_mv AT abbasaghababazadehfarnoosh comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing
AT liqian comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing
AT fridleybrookel comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing