Cargando…

Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis

BACKGROUND: The Gaussian or normal distribution is the most established model to characterize quantitative variation of original data. Accordingly, data are summarized using the arithmetic mean and the standard deviation, by [Image: see text] ± SD, or with the standard error of the mean, [Image: see...

Descripción completa

Detalles Bibliográficos
Autores principales: Limpert, Eckhard, Stahel, Werner A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136454/
https://www.ncbi.nlm.nih.gov/pubmed/21779325
http://dx.doi.org/10.1371/journal.pone.0021403
_version_ 1782208207153790976
author Limpert, Eckhard
Stahel, Werner A.
author_facet Limpert, Eckhard
Stahel, Werner A.
author_sort Limpert, Eckhard
collection PubMed
description BACKGROUND: The Gaussian or normal distribution is the most established model to characterize quantitative variation of original data. Accordingly, data are summarized using the arithmetic mean and the standard deviation, by [Image: see text] ± SD, or with the standard error of the mean, [Image: see text] ± SEM. This, together with corresponding bars in graphical displays has become the standard to characterize variation. METHODOLOGY/PRINCIPAL FINDINGS: Here we question the adequacy of this characterization, and of the model. The published literature provides numerous examples for which such descriptions appear inappropriate because, based on the “95% range check”, their distributions are obviously skewed. In these cases, the symmetric characterization is a poor description and may trigger wrong conclusions. To solve the problem, it is enlightening to regard causes of variation. Multiplicative causes are by far more important than additive ones, in general, and benefit from a multiplicative (or log-) normal approach. Fortunately, quite similar to the normal, the log-normal distribution can now be handled easily and characterized at the level of the original data with the help of both, a new sign, (x)/, times-divide, and notation. Analogous to [Image: see text] ± SD, it connects the multiplicative (or geometric) mean [Image: see text] * and the multiplicative standard deviation s* in the form [Image: see text] * (x)/s*, that is advantageous and recommended. CONCLUSIONS/SIGNIFICANCE: The corresponding shift from the symmetric to the asymmetric view will substantially increase both, recognition of data distributions, and interpretation quality. It will allow for savings in sample size that can be considerable. Moreover, this is in line with ethical responsibility. Adequate models will improve concepts and theories, and provide deeper insight into science and life.
format Online
Article
Text
id pubmed-3136454
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31364542011-07-21 Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis Limpert, Eckhard Stahel, Werner A. PLoS One Research Article BACKGROUND: The Gaussian or normal distribution is the most established model to characterize quantitative variation of original data. Accordingly, data are summarized using the arithmetic mean and the standard deviation, by [Image: see text] ± SD, or with the standard error of the mean, [Image: see text] ± SEM. This, together with corresponding bars in graphical displays has become the standard to characterize variation. METHODOLOGY/PRINCIPAL FINDINGS: Here we question the adequacy of this characterization, and of the model. The published literature provides numerous examples for which such descriptions appear inappropriate because, based on the “95% range check”, their distributions are obviously skewed. In these cases, the symmetric characterization is a poor description and may trigger wrong conclusions. To solve the problem, it is enlightening to regard causes of variation. Multiplicative causes are by far more important than additive ones, in general, and benefit from a multiplicative (or log-) normal approach. Fortunately, quite similar to the normal, the log-normal distribution can now be handled easily and characterized at the level of the original data with the help of both, a new sign, (x)/, times-divide, and notation. Analogous to [Image: see text] ± SD, it connects the multiplicative (or geometric) mean [Image: see text] * and the multiplicative standard deviation s* in the form [Image: see text] * (x)/s*, that is advantageous and recommended. CONCLUSIONS/SIGNIFICANCE: The corresponding shift from the symmetric to the asymmetric view will substantially increase both, recognition of data distributions, and interpretation quality. It will allow for savings in sample size that can be considerable. Moreover, this is in line with ethical responsibility. Adequate models will improve concepts and theories, and provide deeper insight into science and life. Public Library of Science 2011-07-14 /pmc/articles/PMC3136454/ /pubmed/21779325 http://dx.doi.org/10.1371/journal.pone.0021403 Text en Limpert, Stahel. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Limpert, Eckhard
Stahel, Werner A.
Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title_full Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title_fullStr Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title_full_unstemmed Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title_short Problems with Using the Normal Distribution – and Ways to Improve Quality and Efficiency of Data Analysis
title_sort problems with using the normal distribution – and ways to improve quality and efficiency of data analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3136454/
https://www.ncbi.nlm.nih.gov/pubmed/21779325
http://dx.doi.org/10.1371/journal.pone.0021403
work_keys_str_mv AT limperteckhard problemswithusingthenormaldistributionandwaystoimprovequalityandefficiencyofdataanalysis
AT stahelwernera problemswithusingthenormaldistributionandwaystoimprovequalityandefficiencyofdataanalysis