Cargando…
A Zipf-plot based normalization method for high-throughput RNA-seq data
Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have s...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7144957/ https://www.ncbi.nlm.nih.gov/pubmed/32271772 http://dx.doi.org/10.1371/journal.pone.0230594 |
_version_ | 1783519913408724992 |
---|---|
author | Wang, Bin |
author_facet | Wang, Bin |
author_sort | Wang, Bin |
collection | PubMed |
description | Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns. |
format | Online Article Text |
id | pubmed-7144957 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-71449572020-04-10 A Zipf-plot based normalization method for high-throughput RNA-seq data Wang, Bin PLoS One Research Article Normalization is crucial in RNA-seq data analyses. Due to the existence of excessive zeros and a large number of small measures, it is challenging to find reliable linear rescaling normalization parameters. We propose a Zipf plot based normalization method (ZN) assuming that all gene profiles have similar upper tail behaviors in their expression distributions. The new normalization method uses global information of all genes in the same profile without gene-level expression alteration. It doesn’t require the majority of genes to be not differentially expressed (DE), and can be applied to data where the majority of genes are weakly or not expressed. Two normalization schemes are implemented with ZN: a linear rescaling scheme and a non-linear transformation scheme. The linear rescaling scheme can be applied alone or together with the non-linear normalization scheme. The performance of ZN is benchmarked against five popular linear normalization methods for RNA-seq data. Results show that the linear rescaling normalization scheme by itself works well and is robust. The non-linear normalization scheme can further improve the normalization outcomes and is optional if the Zipf plots show parallel patterns. Public Library of Science 2020-04-09 /pmc/articles/PMC7144957/ /pubmed/32271772 http://dx.doi.org/10.1371/journal.pone.0230594 Text en © 2020 Bin Wang http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wang, Bin A Zipf-plot based normalization method for high-throughput RNA-seq data |
title | A Zipf-plot based normalization method for high-throughput RNA-seq data |
title_full | A Zipf-plot based normalization method for high-throughput RNA-seq data |
title_fullStr | A Zipf-plot based normalization method for high-throughput RNA-seq data |
title_full_unstemmed | A Zipf-plot based normalization method for high-throughput RNA-seq data |
title_short | A Zipf-plot based normalization method for high-throughput RNA-seq data |
title_sort | zipf-plot based normalization method for high-throughput rna-seq data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7144957/ https://www.ncbi.nlm.nih.gov/pubmed/32271772 http://dx.doi.org/10.1371/journal.pone.0230594 |
work_keys_str_mv | AT wangbin azipfplotbasednormalizationmethodforhighthroughputrnaseqdata AT wangbin zipfplotbasednormalizationmethodforhighthroughputrnaseqdata |