Cargando…

Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data

BACKGROUND: Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as developed as e.g. in the field of nucleotide sequence compa...

Descripción completa

Detalles Bibliográficos
Autores principales: Kilpinen, Sami K, Ojala, Kalle A, Kallioniemi, Olli P
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080808/
https://www.ncbi.nlm.nih.gov/pubmed/21453538
http://dx.doi.org/10.1186/1756-0381-4-5
_version_ 1782202142254170112
author Kilpinen, Sami K
Ojala, Kalle A
Kallioniemi, Olli P
author_facet Kilpinen, Sami K
Ojala, Kalle A
Kallioniemi, Olli P
author_sort Kilpinen, Sami K
collection PubMed
description BACKGROUND: Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as developed as e.g. in the field of nucleotide sequence comparisons. We have created a rapid and powerful approach for the alignment of microarray gene expression profiles (AGEP) from test samples with those contained in a large annotated public reference database and demonstrate here how this can facilitate interpretation of microarray data from individual samples. METHODS: AGEP is based on the calculation of kernel density distributions for the levels of expression of each gene in each reference tissue type and provides a quantitation of the similarity between the test sample and the reference tissue types as well as the identity of the typical and atypical genes in each comparison. As a reference database, we used 1654 samples from 44 normal tissues (extracted from the Genesapiens database). RESULTS: Using leave-one-out validation, AGEP correctly defined the tissue of origin for 1521 (93.6%) of all the 1654 samples in the original database. Independent validation of 195 external normal tissue samples resulted in 87% accuracy for the exact tissue type and 97% accuracy with related tissue types. AGEP analysis of 10 Duchenne muscular dystrophy (DMD) samples provided quantitative description of the key pathogenetic events, such as the extent of inflammation, in individual samples and pinpointed tissue-specific genes whose expression changed (SAMD4A) in DMD. AGEP analysis of microarray data from adipocytic differentiation of mesenchymal stem cells and from normal myeloid cell types and leukemias provided quantitative characterization of the transcriptomic changes during normal and abnormal cell differentiation. CONCLUSIONS: The AGEP method is a widely applicable method for the rapid comprehensive interpretation of microarray data, as proven here by the definition of tissue- and disease-specific changes in gene expression as well as during cellular differentiation. The capability to quantitatively compare data from individual samples against a large-scale annotated reference database represents a widely applicable paradigm for the analysis of all types of high-throughput data. AGEP enables systematic and quantitative comparison of gene expression data from test samples against a comprehensive collection of different cell/tissue types previously studied by the entire research community.
format Text
id pubmed-3080808
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30808082011-04-22 Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data Kilpinen, Sami K Ojala, Kalle A Kallioniemi, Olli P BioData Min Methodology BACKGROUND: Gene expression microarray data have been organized and made available as public databases, but the utilization of such highly heterogeneous reference datasets in the interpretation of data from individual test samples is not as developed as e.g. in the field of nucleotide sequence comparisons. We have created a rapid and powerful approach for the alignment of microarray gene expression profiles (AGEP) from test samples with those contained in a large annotated public reference database and demonstrate here how this can facilitate interpretation of microarray data from individual samples. METHODS: AGEP is based on the calculation of kernel density distributions for the levels of expression of each gene in each reference tissue type and provides a quantitation of the similarity between the test sample and the reference tissue types as well as the identity of the typical and atypical genes in each comparison. As a reference database, we used 1654 samples from 44 normal tissues (extracted from the Genesapiens database). RESULTS: Using leave-one-out validation, AGEP correctly defined the tissue of origin for 1521 (93.6%) of all the 1654 samples in the original database. Independent validation of 195 external normal tissue samples resulted in 87% accuracy for the exact tissue type and 97% accuracy with related tissue types. AGEP analysis of 10 Duchenne muscular dystrophy (DMD) samples provided quantitative description of the key pathogenetic events, such as the extent of inflammation, in individual samples and pinpointed tissue-specific genes whose expression changed (SAMD4A) in DMD. AGEP analysis of microarray data from adipocytic differentiation of mesenchymal stem cells and from normal myeloid cell types and leukemias provided quantitative characterization of the transcriptomic changes during normal and abnormal cell differentiation. CONCLUSIONS: The AGEP method is a widely applicable method for the rapid comprehensive interpretation of microarray data, as proven here by the definition of tissue- and disease-specific changes in gene expression as well as during cellular differentiation. The capability to quantitatively compare data from individual samples against a large-scale annotated reference database represents a widely applicable paradigm for the analysis of all types of high-throughput data. AGEP enables systematic and quantitative comparison of gene expression data from test samples against a comprehensive collection of different cell/tissue types previously studied by the entire research community. BioMed Central 2011-03-31 /pmc/articles/PMC3080808/ /pubmed/21453538 http://dx.doi.org/10.1186/1756-0381-4-5 Text en Copyright ©2011 Kilpinen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Kilpinen, Sami K
Ojala, Kalle A
Kallioniemi, Olli P
Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_full Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_fullStr Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_full_unstemmed Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_short Alignment of gene expression profiles from test samples against a reference database: New method for context-specific interpretation of microarray data
title_sort alignment of gene expression profiles from test samples against a reference database: new method for context-specific interpretation of microarray data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080808/
https://www.ncbi.nlm.nih.gov/pubmed/21453538
http://dx.doi.org/10.1186/1756-0381-4-5
work_keys_str_mv AT kilpinensamik alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata
AT ojalakallea alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata
AT kallioniemiollip alignmentofgeneexpressionprofilesfromtestsamplesagainstareferencedatabasenewmethodforcontextspecificinterpretationofmicroarraydata