Cargando…

Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression

BACKGROUND: The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these r...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Shuyu, Li, Yiqun Helen, Wei, Tao, Su, Eric Wen, Duffin, Kevin, Liao, Birong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1634740/
https://www.ncbi.nlm.nih.gov/pubmed/17064414
http://dx.doi.org/10.1186/1745-6150-1-33
_version_ 1782130639807447040
author Li, Shuyu
Li, Yiqun Helen
Wei, Tao
Su, Eric Wen
Duffin, Kevin
Liao, Birong
author_facet Li, Shuyu
Li, Yiqun Helen
Wei, Tao
Su, Eric Wen
Duffin, Kevin
Liao, Birong
author_sort Li, Shuyu
collection PubMed
description BACKGROUND: The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. RESULTS: There are 42–54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30–40% of genes whose expression patterns are positively correlated and 10–15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. CONCLUSION: To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes. REVIEWERS: This article was reviewed by Dr. I. King Jordan, Dr. Joel Bader, and Dr. Arcady Mushegian.
format Text
id pubmed-1634740
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16347402006-11-07 Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression Li, Shuyu Li, Yiqun Helen Wei, Tao Su, Eric Wen Duffin, Kevin Liao, Birong Biol Direct Research BACKGROUND: The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. RESULTS: There are 42–54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30–40% of genes whose expression patterns are positively correlated and 10–15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. CONCLUSION: To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes. REVIEWERS: This article was reviewed by Dr. I. King Jordan, Dr. Joel Bader, and Dr. Arcady Mushegian. BioMed Central 2006-10-25 /pmc/articles/PMC1634740/ /pubmed/17064414 http://dx.doi.org/10.1186/1745-6150-1-33 Text en Copyright © 2006 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Li, Shuyu
Li, Yiqun Helen
Wei, Tao
Su, Eric Wen
Duffin, Kevin
Liao, Birong
Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title_full Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title_fullStr Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title_full_unstemmed Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title_short Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
title_sort too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1634740/
https://www.ncbi.nlm.nih.gov/pubmed/17064414
http://dx.doi.org/10.1186/1745-6150-1-33
work_keys_str_mv AT lishuyu toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression
AT liyiqunhelen toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression
AT weitao toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression
AT suericwen toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression
AT duffinkevin toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression
AT liaobirong toomuchdatabutlittleinterchangeabilityalessonlearnedfromminingpublicdataontissuespecificityofgeneexpression