Cargando…

The Impact of Normalization Methods on RNA-Seq Data Analysis

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tac...

Descripción completa

Detalles Bibliográficos
Autores principales: Zyprych-Walczak, J., Szabelska, A., Handschuh, L., Górczak, K., Klamecka, K., Figlerowicz, M., Siatkowski, I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4484837/
https://www.ncbi.nlm.nih.gov/pubmed/26176014
http://dx.doi.org/10.1155/2015/621690
_version_ 1782378717930061824
author Zyprych-Walczak, J.
Szabelska, A.
Handschuh, L.
Górczak, K.
Klamecka, K.
Figlerowicz, M.
Siatkowski, I.
author_facet Zyprych-Walczak, J.
Szabelska, A.
Handschuh, L.
Górczak, K.
Klamecka, K.
Figlerowicz, M.
Siatkowski, I.
author_sort Zyprych-Walczak, J.
collection PubMed
description High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.
format Online
Article
Text
id pubmed-4484837
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-44848372015-07-14 The Impact of Normalization Methods on RNA-Seq Data Analysis Zyprych-Walczak, J. Szabelska, A. Handschuh, L. Górczak, K. Klamecka, K. Figlerowicz, M. Siatkowski, I. Biomed Res Int Research Article High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. Hindawi Publishing Corporation 2015 2015-06-15 /pmc/articles/PMC4484837/ /pubmed/26176014 http://dx.doi.org/10.1155/2015/621690 Text en Copyright © 2015 J. Zyprych-Walczak et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zyprych-Walczak, J.
Szabelska, A.
Handschuh, L.
Górczak, K.
Klamecka, K.
Figlerowicz, M.
Siatkowski, I.
The Impact of Normalization Methods on RNA-Seq Data Analysis
title The Impact of Normalization Methods on RNA-Seq Data Analysis
title_full The Impact of Normalization Methods on RNA-Seq Data Analysis
title_fullStr The Impact of Normalization Methods on RNA-Seq Data Analysis
title_full_unstemmed The Impact of Normalization Methods on RNA-Seq Data Analysis
title_short The Impact of Normalization Methods on RNA-Seq Data Analysis
title_sort impact of normalization methods on rna-seq data analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4484837/
https://www.ncbi.nlm.nih.gov/pubmed/26176014
http://dx.doi.org/10.1155/2015/621690
work_keys_str_mv AT zyprychwalczakj theimpactofnormalizationmethodsonrnaseqdataanalysis
AT szabelskaa theimpactofnormalizationmethodsonrnaseqdataanalysis
AT handschuhl theimpactofnormalizationmethodsonrnaseqdataanalysis
AT gorczakk theimpactofnormalizationmethodsonrnaseqdataanalysis
AT klameckak theimpactofnormalizationmethodsonrnaseqdataanalysis
AT figlerowiczm theimpactofnormalizationmethodsonrnaseqdataanalysis
AT siatkowskii theimpactofnormalizationmethodsonrnaseqdataanalysis
AT zyprychwalczakj impactofnormalizationmethodsonrnaseqdataanalysis
AT szabelskaa impactofnormalizationmethodsonrnaseqdataanalysis
AT handschuhl impactofnormalizationmethodsonrnaseqdataanalysis
AT gorczakk impactofnormalizationmethodsonrnaseqdataanalysis
AT klameckak impactofnormalizationmethodsonrnaseqdataanalysis
AT figlerowiczm impactofnormalizationmethodsonrnaseqdataanalysis
AT siatkowskii impactofnormalizationmethodsonrnaseqdataanalysis