Cargando…

Escape Excel: A tool for preventing gene symbol and accession conversion errors

BACKGROUND: Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popu...

Descripción completa

Detalles Bibliográficos
Autores principales: Welsh, Eric A., Stewart, Paul A., Kuenzi, Brent M., Eschrich, James A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617173/
https://www.ncbi.nlm.nih.gov/pubmed/28953918
http://dx.doi.org/10.1371/journal.pone.0185207
_version_ 1783266949555290112
author Welsh, Eric A.
Stewart, Paul A.
Kuenzi, Brent M.
Eschrich, James A.
author_facet Welsh, Eric A.
Stewart, Paul A.
Kuenzi, Brent M.
Eschrich, James A.
author_sort Welsh, Eric A.
collection PubMed
description BACKGROUND: Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. RESULTS: Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). CONCLUSIONS: Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
format Online
Article
Text
id pubmed-5617173
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56171732017-10-09 Escape Excel: A tool for preventing gene symbol and accession conversion errors Welsh, Eric A. Stewart, Paul A. Kuenzi, Brent M. Eschrich, James A. PLoS One Research Article BACKGROUND: Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. RESULTS: Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). CONCLUSIONS: Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications. Public Library of Science 2017-09-27 /pmc/articles/PMC5617173/ /pubmed/28953918 http://dx.doi.org/10.1371/journal.pone.0185207 Text en © 2017 Welsh et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Welsh, Eric A.
Stewart, Paul A.
Kuenzi, Brent M.
Eschrich, James A.
Escape Excel: A tool for preventing gene symbol and accession conversion errors
title Escape Excel: A tool for preventing gene symbol and accession conversion errors
title_full Escape Excel: A tool for preventing gene symbol and accession conversion errors
title_fullStr Escape Excel: A tool for preventing gene symbol and accession conversion errors
title_full_unstemmed Escape Excel: A tool for preventing gene symbol and accession conversion errors
title_short Escape Excel: A tool for preventing gene symbol and accession conversion errors
title_sort escape excel: a tool for preventing gene symbol and accession conversion errors
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5617173/
https://www.ncbi.nlm.nih.gov/pubmed/28953918
http://dx.doi.org/10.1371/journal.pone.0185207
work_keys_str_mv AT welsherica escapeexcelatoolforpreventinggenesymbolandaccessionconversionerrors
AT stewartpaula escapeexcelatoolforpreventinggenesymbolandaccessionconversionerrors
AT kuenzibrentm escapeexcelatoolforpreventinggenesymbolandaccessionconversionerrors
AT eschrichjamesa escapeexcelatoolforpreventinggenesymbolandaccessionconversionerrors