Cargando…
Count data in biology—Data transformation or model reformation?
Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐n...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5869353/ https://www.ncbi.nlm.nih.gov/pubmed/29607007 http://dx.doi.org/10.1002/ece3.3807 |
_version_ | 1783309270937239552 |
---|---|
author | St‐Pierre, Anne P. Shikon, Violaine Schneider, David C. |
author_facet | St‐Pierre, Anne P. Shikon, Violaine Schneider, David C. |
author_sort | St‐Pierre, Anne P. |
collection | PubMed |
description | Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p‐values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back‐transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology. |
format | Online Article Text |
id | pubmed-5869353 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-58693532018-03-30 Count data in biology—Data transformation or model reformation? St‐Pierre, Anne P. Shikon, Violaine Schneider, David C. Ecol Evol Original Research Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from data transformation toward model reformation—the use of non‐normal error structures within the framework of the generalized linear model (GLM). The principal advantage of model reformation is that parameters are estimated on the original, rather than the transformed scale. However, data transformation has been shown to give better control over type I error, for simulated data with known error structures. We conducted a literature review of statistical textbooks directed toward biologists and of journal articles published in the primary literature to determine temporal trends in both the text recommendations and the practice in the refereed literature over the past 35 years. In this review, a trend of increasing use of reformation in the primary literature was evident, moving from no use of reformation before 1996 to >50% of the articles reviewed applying GLM after 2006. However, no such trend was observed in the recommendations in statistical textbooks. We then undertook 12 analyses based on published datasets in which we compared the type I error estimates, residual plot diagnostics, and coefficients yielded by analyses using square root transformations, log transformations, and the GLM. All analyses yielded acceptable residual versus fit plots and had similar p‐values within each analysis, but as expected, the coefficient estimates differed substantially. Furthermore, no consensus could be found in the literature regarding a procedure to back‐transform the coefficient estimates obtained from linear models performed on transformed datasets. This lack of consistency among coefficient estimates constitutes a major argument for model reformation over data transformation in biology. John Wiley and Sons Inc. 2018-02-16 /pmc/articles/PMC5869353/ /pubmed/29607007 http://dx.doi.org/10.1002/ece3.3807 Text en © 2018 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Research St‐Pierre, Anne P. Shikon, Violaine Schneider, David C. Count data in biology—Data transformation or model reformation? |
title | Count data in biology—Data transformation or model reformation? |
title_full | Count data in biology—Data transformation or model reformation? |
title_fullStr | Count data in biology—Data transformation or model reformation? |
title_full_unstemmed | Count data in biology—Data transformation or model reformation? |
title_short | Count data in biology—Data transformation or model reformation? |
title_sort | count data in biology—data transformation or model reformation? |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5869353/ https://www.ncbi.nlm.nih.gov/pubmed/29607007 http://dx.doi.org/10.1002/ece3.3807 |
work_keys_str_mv | AT stpierreannep countdatainbiologydatatransformationormodelreformation AT shikonviolaine countdatainbiologydatatransformationormodelreformation AT schneiderdavidc countdatainbiologydatatransformationormodelreformation |