Cargando…
Gene name errors: Lessons not learned
Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357140/ https://www.ncbi.nlm.nih.gov/pubmed/34329294 http://dx.doi.org/10.1371/journal.pcbi.1008984 |
_version_ | 1783737081945653248 |
---|---|
author | Abeysooriya, Mandhri Soria, Megan Kasu, Mary Sravya Ziemann, Mark |
author_facet | Abeysooriya, Mandhri Soria, Megan Kasu, Mary Sravya Ziemann, Mark |
author_sort | Abeysooriya, Mandhri |
collection | PubMed |
description | Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data. |
format | Online Article Text |
id | pubmed-8357140 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-83571402021-08-12 Gene name errors: Lessons not learned Abeysooriya, Mandhri Soria, Megan Kasu, Mary Sravya Ziemann, Mark PLoS Comput Biol Research Article Erroneous conversion of gene names into other dates and other data types has been a frustration for computational biologists for years. We hypothesized that such errors in supplementary files might diminish after a report in 2016 highlighting the extent of the problem. To assess this, we performed a scan of supplementary files published in PubMed Central from 2014 to 2020. Overall, gene name errors continued to accumulate unabated in the period after 2016. An improved scanning software we developed identified gene name errors in 30.9% (3,436/11,117) of articles with supplementary Excel gene lists; a figure significantly higher than previously estimated. This is due to gene names being converted not just to dates and floating-point numbers, but also to internal date format (five-digit numbers). These findings further reinforce that spreadsheets are ill-suited to use with large genomic data. Public Library of Science 2021-07-30 /pmc/articles/PMC8357140/ /pubmed/34329294 http://dx.doi.org/10.1371/journal.pcbi.1008984 Text en © 2021 Abeysooriya et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Abeysooriya, Mandhri Soria, Megan Kasu, Mary Sravya Ziemann, Mark Gene name errors: Lessons not learned |
title | Gene name errors: Lessons not learned |
title_full | Gene name errors: Lessons not learned |
title_fullStr | Gene name errors: Lessons not learned |
title_full_unstemmed | Gene name errors: Lessons not learned |
title_short | Gene name errors: Lessons not learned |
title_sort | gene name errors: lessons not learned |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8357140/ https://www.ncbi.nlm.nih.gov/pubmed/34329294 http://dx.doi.org/10.1371/journal.pcbi.1008984 |
work_keys_str_mv | AT abeysooriyamandhri genenameerrorslessonsnotlearned AT soriamegan genenameerrorslessonsnotlearned AT kasumarysravya genenameerrorslessonsnotlearned AT ziemannmark genenameerrorslessonsnotlearned |