Cargando…
Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing
Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209231/ https://www.ncbi.nlm.nih.gov/pubmed/30379879 http://dx.doi.org/10.1371/journal.pone.0206312 |
_version_ | 1783366869319680000 |
---|---|
author | Abbas-Aghababazadeh, Farnoosh Li, Qian Fridley, Brooke L. |
author_facet | Abbas-Aghababazadeh, Farnoosh Li, Qian Fridley, Brooke L. |
author_sort | Abbas-Aghababazadeh, Farnoosh |
collection | PubMed |
description | Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data. |
format | Online Article Text |
id | pubmed-6209231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-62092312018-11-19 Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing Abbas-Aghababazadeh, Farnoosh Li, Qian Fridley, Brooke L. PLoS One Research Article Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA (“BE”) method outperforms the other methods (SVA “Leek”, PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data. Public Library of Science 2018-10-31 /pmc/articles/PMC6209231/ /pubmed/30379879 http://dx.doi.org/10.1371/journal.pone.0206312 Text en © 2018 Abbas-Aghababazadeh et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Abbas-Aghababazadeh, Farnoosh Li, Qian Fridley, Brooke L. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title | Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title_full | Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title_fullStr | Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title_full_unstemmed | Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title_short | Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
title_sort | comparison of normalization approaches for gene expression studies completed with high-throughput sequencing |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6209231/ https://www.ncbi.nlm.nih.gov/pubmed/30379879 http://dx.doi.org/10.1371/journal.pone.0206312 |
work_keys_str_mv | AT abbasaghababazadehfarnoosh comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing AT liqian comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing AT fridleybrookel comparisonofnormalizationapproachesforgeneexpressionstudiescompletedwithhighthroughputsequencing |