Cargando…

Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics

BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeeberg, Barry R, Riss, Joseph, Kane, David W, Bussey, Kimberly J, Uchio, Edward, Linehan, W Marston, Barrett, J Carl, Weinstein, John N
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC459209/
https://www.ncbi.nlm.nih.gov/pubmed/15214961
http://dx.doi.org/10.1186/1471-2105-5-80
_version_ 1782121593170821120
author Zeeberg, Barry R
Riss, Joseph
Kane, David W
Bussey, Kimberly J
Uchio, Edward
Linehan, W Marston
Barrett, J Carl
Weinstein, John N
author_facet Zeeberg, Barry R
Riss, Joseph
Kane, David W
Bussey, Kimberly J
Uchio, Edward
Linehan, W Marston
Barrett, J Carl
Weinstein, John N
author_sort Zeeberg, Barry R
collection PubMed
description BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. CONCLUSIONS: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.
format Text
id pubmed-459209
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4592092004-07-16 Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics Zeeberg, Barry R Riss, Joseph Kane, David W Bussey, Kimberly J Uchio, Edward Linehan, W Marston Barrett, J Carl Weinstein, John N BMC Bioinformatics Correspondence BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. CONCLUSIONS: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem. BioMed Central 2004-06-23 /pmc/articles/PMC459209/ /pubmed/15214961 http://dx.doi.org/10.1186/1471-2105-5-80 Text en Copyright © 2004 Zeeberg et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Correspondence
Zeeberg, Barry R
Riss, Joseph
Kane, David W
Bussey, Kimberly J
Uchio, Edward
Linehan, W Marston
Barrett, J Carl
Weinstein, John N
Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title_full Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title_fullStr Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title_full_unstemmed Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title_short Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
title_sort mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC459209/
https://www.ncbi.nlm.nih.gov/pubmed/15214961
http://dx.doi.org/10.1186/1471-2105-5-80
work_keys_str_mv AT zeebergbarryr mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT rissjoseph mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT kanedavidw mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT busseykimberlyj mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT uchioedward mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT linehanwmarston mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT barrettjcarl mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics
AT weinsteinjohnn mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics