Cargando…

How does correlation structure differ between real and fabricated data-sets?

BACKGROUND: Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually...

Descripción completa

Detalles Bibliográficos
Autores principales: Akhtar-Danesh, Noori, Dehghan-Kooshkghazi, Mahshid
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC212490/
https://www.ncbi.nlm.nih.gov/pubmed/14516474
http://dx.doi.org/10.1186/1471-2288-3-18
_version_ 1782120960347865088
author Akhtar-Danesh, Noori
Dehghan-Kooshkghazi, Mahshid
author_facet Akhtar-Danesh, Noori
Dehghan-Kooshkghazi, Mahshid
author_sort Akhtar-Danesh, Noori
collection PubMed
description BACKGROUND: Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables. METHOD: Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means. RESULTS: In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different. CONCLUSION: This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data.
format Text
id pubmed-212490
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-2124902003-10-11 How does correlation structure differ between real and fabricated data-sets? Akhtar-Danesh, Noori Dehghan-Kooshkghazi, Mahshid BMC Med Res Methodol Research Article BACKGROUND: Misconduct in medical research has been the subject of many papers in recent years. Among different types of misconduct, data fabrication might be considered as one of the most severe cases. There have been some arguments that correlation coefficients in fabricated data-sets are usually greater than that found in real data-sets. We aim to study the differences between real and fabricated data-sets in term of the association between two variables. METHOD: Three examples are presented where outcomes from made up (fabricated) data-sets are compared with the results from three real data-sets and with appropriate simulated data-sets. Data-sets were made up by faculty members in three universities. The first two examples are devoted to the correlation structures between continuous variables in two different settings: first, when there is high correlation coefficient between variables, second, when the variables are not correlated. In the third example the differences between real data-set and fabricated data-sets are studied using the independent t-test for comparison between two means. RESULTS: In general, higher correlation coefficients are seen in made up data-sets compared to the real data-sets. This occurs even when the participants are aware that the correlation coefficient for the corresponding real data-set is zero. The findings from the third example, a comparison between means in two groups, shows that many people tend to make up data with less or no differences between groups even when they know how and to what extent the groups are different. CONCLUSION: This study indicates that high correlation coefficients can be considered as a leading sign of data fabrication; as more than 40% of the participants generated variables with correlation coefficients greater than 0.70. However, when inspecting for the differences between means in different groups, the same rule may not be applicable as we observed smaller differences between groups in made up compared to the real data-set. We also showed that inspecting the scatter-plot of two variables can be considered as a useful tool for uncovering fabricated data. BioMed Central 2003-09-29 /pmc/articles/PMC212490/ /pubmed/14516474 http://dx.doi.org/10.1186/1471-2288-3-18 Text en Copyright © 2003 Akhtar-Danesh and Dehghan-Kooshkghazi; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Akhtar-Danesh, Noori
Dehghan-Kooshkghazi, Mahshid
How does correlation structure differ between real and fabricated data-sets?
title How does correlation structure differ between real and fabricated data-sets?
title_full How does correlation structure differ between real and fabricated data-sets?
title_fullStr How does correlation structure differ between real and fabricated data-sets?
title_full_unstemmed How does correlation structure differ between real and fabricated data-sets?
title_short How does correlation structure differ between real and fabricated data-sets?
title_sort how does correlation structure differ between real and fabricated data-sets?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC212490/
https://www.ncbi.nlm.nih.gov/pubmed/14516474
http://dx.doi.org/10.1186/1471-2288-3-18
work_keys_str_mv AT akhtardaneshnoori howdoescorrelationstructuredifferbetweenrealandfabricateddatasets
AT dehghankooshkghazimahshid howdoescorrelationstructuredifferbetweenrealandfabricateddatasets