Cargando…

NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results,...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Zhenfeng, Liu, Weixiang, Jin, Xiufeng, Ji, Haishuo, Wang, Hua, Glusman, Gustavo, Robinson, Max, Liu, Lin, Ruan, Jishou, Gao, Shan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6503164/
https://www.ncbi.nlm.nih.gov/pubmed/31114611
http://dx.doi.org/10.3389/fgene.2019.00400
_version_ 1783416369177427968
author Wu, Zhenfeng
Liu, Weixiang
Jin, Xiufeng
Ji, Haishuo
Wang, Hua
Glusman, Gustavo
Robinson, Max
Liu, Lin
Ruan, Jishou
Gao, Shan
author_facet Wu, Zhenfeng
Liu, Weixiang
Jin, Xiufeng
Ji, Haishuo
Wang, Hua
Glusman, Gustavo
Robinson, Max
Liu, Lin
Ruan, Jishou
Gao, Shan
author_sort Wu, Zhenfeng
collection PubMed
description Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
format Online
Article
Text
id pubmed-6503164
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65031642019-05-21 NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods Wu, Zhenfeng Liu, Weixiang Jin, Xiufeng Ji, Haishuo Wang, Hua Glusman, Gustavo Robinson, Max Liu, Lin Ruan, Jishou Gao, Shan Front Genet Genetics Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets. Frontiers Media S.A. 2019-04-30 /pmc/articles/PMC6503164/ /pubmed/31114611 http://dx.doi.org/10.3389/fgene.2019.00400 Text en Copyright © 2019 Wu, Liu, Jin, Ji, Wang, Glusman, Robinson, Liu, Ruan and Gao. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Wu, Zhenfeng
Liu, Weixiang
Jin, Xiufeng
Ji, Haishuo
Wang, Hua
Glusman, Gustavo
Robinson, Max
Liu, Lin
Ruan, Jishou
Gao, Shan
NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title_full NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title_fullStr NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title_full_unstemmed NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title_short NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
title_sort normexpression: an r package to normalize gene expression data using evaluated methods
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6503164/
https://www.ncbi.nlm.nih.gov/pubmed/31114611
http://dx.doi.org/10.3389/fgene.2019.00400
work_keys_str_mv AT wuzhenfeng normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT liuweixiang normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT jinxiufeng normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT jihaishuo normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT wanghua normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT glusmangustavo normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT robinsonmax normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT liulin normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT ruanjishou normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods
AT gaoshan normexpressionanrpackagetonormalizegeneexpressiondatausingevaluatedmethods