Cargando…

Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies

Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided anno...

Descripción completa

Detalles Bibliográficos
Autores principales: Toker, Lilah, Feng, Min, Pavlidis, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034794/
https://www.ncbi.nlm.nih.gov/pubmed/27746907
http://dx.doi.org/10.12688/f1000research.9471.2
_version_ 1782455328969850880
author Toker, Lilah
Feng, Min
Pavlidis, Paul
author_facet Toker, Lilah
Feng, Min
Pavlidis, Paul
author_sort Toker, Lilah
collection PubMed
description Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies of 33%. In a separate analysis of a set of datasets concerning a single cohort of subjects, 2/4 had mislabeled samples, indicating laboratory mix-ups rather than data recording errors. While the number of mixed-up samples per study was generally small, because our method can only identify a subset of potential mix-ups, our estimate is conservative for the breadth of the problem. Our findings emphasize the need for more stringent sample tracking, and that re-users of published data must be alert to the possibility of annotation and labelling errors.
format Online
Article
Text
id pubmed-5034794
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-50347942016-10-13 Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies Toker, Lilah Feng, Min Pavlidis, Paul F1000Res Research Article Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies of 33%. In a separate analysis of a set of datasets concerning a single cohort of subjects, 2/4 had mislabeled samples, indicating laboratory mix-ups rather than data recording errors. While the number of mixed-up samples per study was generally small, because our method can only identify a subset of potential mix-ups, our estimate is conservative for the breadth of the problem. Our findings emphasize the need for more stringent sample tracking, and that re-users of published data must be alert to the possibility of annotation and labelling errors. F1000Research 2016-09-30 /pmc/articles/PMC5034794/ /pubmed/27746907 http://dx.doi.org/10.12688/f1000research.9471.2 Text en Copyright: © 2016 Toker L et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Toker, Lilah
Feng, Min
Pavlidis, Paul
Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title_full Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title_fullStr Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title_full_unstemmed Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title_short Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
title_sort whose sample is it anyway? widespread misannotation of samples in transcriptomics studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5034794/
https://www.ncbi.nlm.nih.gov/pubmed/27746907
http://dx.doi.org/10.12688/f1000research.9471.2
work_keys_str_mv AT tokerlilah whosesampleisitanywaywidespreadmisannotationofsamplesintranscriptomicsstudies
AT fengmin whosesampleisitanywaywidespreadmisannotationofsamplesintranscriptomicsstudies
AT pavlidispaul whosesampleisitanywaywidespreadmisannotationofsamplesintranscriptomicsstudies