Cargando…
Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC459209/ https://www.ncbi.nlm.nih.gov/pubmed/15214961 http://dx.doi.org/10.1186/1471-2105-5-80 |
_version_ | 1782121593170821120 |
---|---|
author | Zeeberg, Barry R Riss, Joseph Kane, David W Bussey, Kimberly J Uchio, Edward Linehan, W Marston Barrett, J Carl Weinstein, John N |
author_facet | Zeeberg, Barry R Riss, Joseph Kane, David W Bussey, Kimberly J Uchio, Edward Linehan, W Marston Barrett, J Carl Weinstein, John N |
author_sort | Zeeberg, Barry R |
collection | PubMed |
description | BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. CONCLUSIONS: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem. |
format | Text |
id | pubmed-459209 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-4592092004-07-16 Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics Zeeberg, Barry R Riss, Joseph Kane, David W Bussey, Kimberly J Uchio, Edward Linehan, W Marston Barrett, J Carl Weinstein, John N BMC Bioinformatics Correspondence BACKGROUND: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. RESULTS: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. CONCLUSIONS: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem. BioMed Central 2004-06-23 /pmc/articles/PMC459209/ /pubmed/15214961 http://dx.doi.org/10.1186/1471-2105-5-80 Text en Copyright © 2004 Zeeberg et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Correspondence Zeeberg, Barry R Riss, Joseph Kane, David W Bussey, Kimberly J Uchio, Edward Linehan, W Marston Barrett, J Carl Weinstein, John N Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title | Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title_full | Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title_fullStr | Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title_full_unstemmed | Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title_short | Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics |
title_sort | mistaken identifiers: gene name errors can be introduced inadvertently when using excel in bioinformatics |
topic | Correspondence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC459209/ https://www.ncbi.nlm.nih.gov/pubmed/15214961 http://dx.doi.org/10.1186/1471-2105-5-80 |
work_keys_str_mv | AT zeebergbarryr mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT rissjoseph mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT kanedavidw mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT busseykimberlyj mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT uchioedward mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT linehanwmarston mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT barrettjcarl mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics AT weinsteinjohnn mistakenidentifiersgenenameerrorscanbeintroducedinadvertentlywhenusingexcelinbioinformatics |