Cargando…

Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression

BACKGROUND: The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these r...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Shuyu, Li, Yiqun Helen, Wei, Tao, Su, Eric Wen, Duffin, Kevin, Liao, Birong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1634740/
https://www.ncbi.nlm.nih.gov/pubmed/17064414
http://dx.doi.org/10.1186/1745-6150-1-33
Descripción
Sumario:BACKGROUND: The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. RESULTS: There are 42–54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30–40% of genes whose expression patterns are positively correlated and 10–15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. CONCLUSION: To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes. REVIEWERS: This article was reviewed by Dr. I. King Jordan, Dr. Joel Bader, and Dr. Arcady Mushegian.