Cargando…

Gene name errors: Lessons not learned

Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a...

Descripción completa

Detalles Bibliográficos
Autores principales: Abeysooriya, Mandhri, Soria, Megan, Kasu, Mary Sravya, Ziemann, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357140/
https://www.ncbi.nlm.nih.gov/pubmed/34329294
http://dx.doi.org/10.1371/journal.pcbi.1008984
_version_ 1783737081945653248
author Abeysooriya, Mandhri
Soria, Megan
Kasu, Mary Sravya
Ziemann, Mark
author_facet Abeysooriya, Mandhri
Soria, Megan
Kasu, Mary Sravya
Ziemann, Mark
author_sort Abeysooriya, Mandhri
collection PubMed
description Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data.
format Online
Article
Text
id pubmed-8357140
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-83571402021-08-12 Gene name errors: Lessons not learned Abeysooriya, Mandhri Soria, Megan Kasu, Mary Sravya Ziemann, Mark PLoS Comput Biol Research Article Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data. Public Library of Science 2021-07-30 /pmc/articles/PMC8357140/ /pubmed/34329294 http://dx.doi.org/10.1371/journal.pcbi.1008984 Text en © 2021 Abeysooriya et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Abeysooriya, Mandhri
Soria, Megan
Kasu, Mary Sravya
Ziemann, Mark
Gene name errors: Lessons not learned
title Gene name errors: Lessons not learned
title_full Gene name errors: Lessons not learned
title_fullStr Gene name errors: Lessons not learned
title_full_unstemmed Gene name errors: Lessons not learned
title_short Gene name errors: Lessons not learned
title_sort gene name errors: lessons not learned
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357140/
https://www.ncbi.nlm.nih.gov/pubmed/34329294
http://dx.doi.org/10.1371/journal.pcbi.1008984
work_keys_str_mv AT abeysooriyamandhri genenameerrorslessonsnotlearned
AT soriamegan genenameerrorslessonsnotlearned
AT kasumarysravya genenameerrorslessonsnotlearned
AT ziemannmark genenameerrorslessonsnotlearned