Cargando…

Violating the normality assumption may be the lesser of two evils

When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Knief, Ulrich, Forstmeier, Wolfgang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8613103/ https://www.ncbi.nlm.nih.gov/pubmed/33963496 http://dx.doi.org/10.3758/s13428-021-01587-5

_version_	1784603567807528960
author	Knief, Ulrich Forstmeier, Wolfgang
author_facet	Knief, Ulrich Forstmeier, Wolfgang
author_sort	Knief, Ulrich
collection	PubMed
description	When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.3758/s13428-021-01587-5.
format	Online Article Text
id	pubmed-8613103
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-86131032021-12-10 Violating the normality assumption may be the lesser of two evils Knief, Ulrich Forstmeier, Wolfgang Behav Res Methods Article When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.3758/s13428-021-01587-5. Springer US 2021-05-07 2021 /pmc/articles/PMC8613103/ /pubmed/33963496 http://dx.doi.org/10.3758/s13428-021-01587-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Knief, Ulrich Forstmeier, Wolfgang Violating the normality assumption may be the lesser of two evils
title	Violating the normality assumption may be the lesser of two evils
title_full	Violating the normality assumption may be the lesser of two evils
title_fullStr	Violating the normality assumption may be the lesser of two evils
title_full_unstemmed	Violating the normality assumption may be the lesser of two evils
title_short	Violating the normality assumption may be the lesser of two evils
title_sort	violating the normality assumption may be the lesser of two evils
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8613103/ https://www.ncbi.nlm.nih.gov/pubmed/33963496 http://dx.doi.org/10.3758/s13428-021-01587-5
work_keys_str_mv	AT kniefulrich violatingthenormalityassumptionmaybethelesseroftwoevils AT forstmeierwolfgang violatingthenormalityassumptionmaybethelesseroftwoevils

Violating the normality assumption may be the lesser of two evils

Ejemplares similares