Cargando…

Count data in biology—Data transformation or model reformation?

Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐n...

Descripción completa

Detalles Bibliográficos
Autores principales: St‐Pierre, Anne P., Shikon, Violaine, Schneider, David C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5869353/
https://www.ncbi.nlm.nih.gov/pubmed/29607007
http://dx.doi.org/10.1002/ece3.3807
_version_ 1783309270937239552
author St‐Pierre, Anne P.
Shikon, Violaine
Schneider, David C.
author_facet St‐Pierre, Anne P.
Shikon, Violaine
Schneider, David C.
author_sort St‐Pierre, Anne P.
collection PubMed
description Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p‐values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back‐transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology.
format Online
Article
Text
id pubmed-5869353
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-58693532018-03-30 Count data in biology—Data transformation or model reformation? St‐Pierre, Anne P. Shikon, Violaine Schneider, David C. Ecol Evol Original Research Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p‐values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back‐transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology. John Wiley and Sons Inc. 2018-02-16 /pmc/articles/PMC5869353/ /pubmed/29607007 http://dx.doi.org/10.1002/ece3.3807 Text en © 2018 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Research
St‐Pierre, Anne P.
Shikon, Violaine
Schneider, David C.
Count data in biology—Data transformation or model reformation?
title Count data in biology—Data transformation or model reformation?
title_full Count data in biology—Data transformation or model reformation?
title_fullStr Count data in biology—Data transformation or model reformation?
title_full_unstemmed Count data in biology—Data transformation or model reformation?
title_short Count data in biology—Data transformation or model reformation?
title_sort count data in biology—data transformation or model reformation?
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5869353/
https://www.ncbi.nlm.nih.gov/pubmed/29607007
http://dx.doi.org/10.1002/ece3.3807
work_keys_str_mv AT stpierreannep countdatainbiologydatatransformationormodelreformation
AT shikonviolaine countdatainbiologydatatransformationormodelreformation
AT schneiderdavidc countdatainbiologydatatransformationormodelreformation