Cargando…

A Zipf-plot based normalization method for high-throughput RNA-seq data

Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have s...

Descripción completa

Detalles Bibliográficos
Autor principal: Wang, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7144957/
https://www.ncbi.nlm.nih.gov/pubmed/32271772
http://dx.doi.org/10.1371/journal.pone.0230594
_version_ 1783519913408724992
author Wang, Bin
author_facet Wang, Bin
author_sort Wang, Bin
collection PubMed
description Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns.
format Online
Article
Text
id pubmed-7144957
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71449572020-04-10 A Zipf-plot based normalization method for high-throughput RNA-seq data Wang, Bin PLoS One Research Article Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns. Public Library of Science 2020-04-09 /pmc/articles/PMC7144957/ /pubmed/32271772 http://dx.doi.org/10.1371/journal.pone.0230594 Text en © 2020 Bin Wang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wang, Bin
A Zipf-plot based normalization method for high-throughput RNA-seq data
title A Zipf-plot based normalization method for high-throughput RNA-seq data
title_full A Zipf-plot based normalization method for high-throughput RNA-seq data
title_fullStr A Zipf-plot based normalization method for high-throughput RNA-seq data
title_full_unstemmed A Zipf-plot based normalization method for high-throughput RNA-seq data
title_short A Zipf-plot based normalization method for high-throughput RNA-seq data
title_sort zipf-plot based normalization method for high-throughput rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7144957/
https://www.ncbi.nlm.nih.gov/pubmed/32271772
http://dx.doi.org/10.1371/journal.pone.0230594
work_keys_str_mv AT wangbin azipfplotbasednormalizationmethodforhighthroughputrnaseqdata
AT wangbin zipfplotbasednormalizationmethodforhighthroughputrnaseqdata