Cargando…

Assessing the reproducibility of discriminant function analyses

Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses...

Descripción completa

Detalles Bibliográficos
Autores principales: Andrew, Rose L., Albert, Arianne Y.K., Renaut, Sebastien, Rennison, Diana J., Bock, Dan G., Vines, Tim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4540019/
https://www.ncbi.nlm.nih.gov/pubmed/26290793
http://dx.doi.org/10.7717/peerj.1137
_version_ 1782386182884163584
author Andrew, Rose L.
Albert, Arianne Y.K.
Renaut, Sebastien
Rennison, Diana J.
Bock, Dan G.
Vines, Tim
author_facet Andrew, Rose L.
Albert, Arianne Y.K.
Renaut, Sebastien
Rennison, Diana J.
Bock, Dan G.
Vines, Tim
author_sort Andrew, Rose L.
collection PubMed
description Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects.
format Online
Article
Text
id pubmed-4540019
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-45400192015-08-19 Assessing the reproducibility of discriminant function analyses Andrew, Rose L. Albert, Arianne Y.K. Renaut, Sebastien Rennison, Diana J. Bock, Dan G. Vines, Tim PeerJ Evolutionary Studies Data are the foundation of empirical research, yet all too often the datasets underlying published papers are unavailable, incorrect, or poorly curated. This is a serious issue, because future researchers are then unable to validate published results or reuse data to explore new ideas and hypotheses. Even if data files are securely stored and accessible, they must also be accompanied by accurate labels and identifiers. To assess how often problems with metadata or data curation affect the reproducibility of published results, we attempted to reproduce Discriminant Function Analyses (DFAs) from the field of organismal biology. DFA is a commonly used statistical analysis that has changed little since its inception almost eight decades ago, and therefore provides an opportunity to test reproducibility among datasets of varying ages. Out of 100 papers we initially surveyed, fourteen were excluded because they did not present the common types of quantitative result from their DFA or gave insufficient details of their DFA. Of the remaining 86 datasets, there were 15 cases for which we were unable to confidently relate the dataset we received to the one used in the published analysis. The reasons ranged from incomprehensible or absent variable labels, the DFA being performed on an unspecified subset of the data, or the dataset we received being incomplete. We focused on reproducing three common summary statistics from DFAs: the percent variance explained, the percentage correctly assigned and the largest discriminant function coefficient. The reproducibility of the first two was fairly high (20 of 26, and 44 of 60 datasets, respectively), whereas our success rate with the discriminant function coefficients was lower (15 of 26 datasets). When considering all three summary statistics, we were able to completely reproduce 46 (65%) of 71 datasets. While our results show that a majority of studies are reproducible, they highlight the fact that many studies still are not the carefully curated research that the scientific community and public expects. PeerJ Inc. 2015-08-04 /pmc/articles/PMC4540019/ /pubmed/26290793 http://dx.doi.org/10.7717/peerj.1137 Text en © 2015 Andrew et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Evolutionary Studies
Andrew, Rose L.
Albert, Arianne Y.K.
Renaut, Sebastien
Rennison, Diana J.
Bock, Dan G.
Vines, Tim
Assessing the reproducibility of discriminant function analyses
title Assessing the reproducibility of discriminant function analyses
title_full Assessing the reproducibility of discriminant function analyses
title_fullStr Assessing the reproducibility of discriminant function analyses
title_full_unstemmed Assessing the reproducibility of discriminant function analyses
title_short Assessing the reproducibility of discriminant function analyses
title_sort assessing the reproducibility of discriminant function analyses
topic Evolutionary Studies
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4540019/
https://www.ncbi.nlm.nih.gov/pubmed/26290793
http://dx.doi.org/10.7717/peerj.1137
work_keys_str_mv AT andrewrosel assessingthereproducibilityofdiscriminantfunctionanalyses
AT albertarianneyk assessingthereproducibilityofdiscriminantfunctionanalyses
AT renautsebastien assessingthereproducibilityofdiscriminantfunctionanalyses
AT rennisondianaj assessingthereproducibilityofdiscriminantfunctionanalyses
AT bockdang assessingthereproducibilityofdiscriminantfunctionanalyses
AT vinestim assessingthereproducibilityofdiscriminantfunctionanalyses