Cargando…
Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes
BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of g...
Autores principales: | , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2004
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC476733/ https://www.ncbi.nlm.nih.gov/pubmed/15217521 http://dx.doi.org/10.1186/1471-2105-5-81 |
_version_ | 1782121634671362048 |
---|---|
author | Jiang, Hongying Deng, Youping Chen, Huann-Sheng Tao, Lin Sha, Qiuying Chen, Jun Tsai, Chung-Jui Zhang, Shuanglin |
author_facet | Jiang, Hongying Deng, Youping Chen, Huann-Sheng Tao, Lin Sha, Qiuying Chen, Jun Tsai, Chung-Jui Zhang, Shuanglin |
author_sort | Jiang, Hongying |
collection | PubMed |
description | BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip(®), employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively. RESULTS: In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16. CONCLUSION: This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy. |
format | Text |
id | pubmed-476733 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2004 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-4767332004-07-18 Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes Jiang, Hongying Deng, Youping Chen, Huann-Sheng Tao, Lin Sha, Qiuying Chen, Jun Tsai, Chung-Jui Zhang, Shuanglin BMC Bioinformatics Methodology Article BACKGROUND: Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip(®), employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively. RESULTS: In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16. CONCLUSION: This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy. BioMed Central 2004-06-24 /pmc/articles/PMC476733/ /pubmed/15217521 http://dx.doi.org/10.1186/1471-2105-5-81 Text en Copyright © 2004 Jiang et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. |
spellingShingle | Methodology Article Jiang, Hongying Deng, Youping Chen, Huann-Sheng Tao, Lin Sha, Qiuying Chen, Jun Tsai, Chung-Jui Zhang, Shuanglin Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title | Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title_full | Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title_fullStr | Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title_full_unstemmed | Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title_short | Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
title_sort | joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC476733/ https://www.ncbi.nlm.nih.gov/pubmed/15217521 http://dx.doi.org/10.1186/1471-2105-5-81 |
work_keys_str_mv | AT jianghongying jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT dengyouping jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT chenhuannsheng jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT taolin jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT shaqiuying jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT chenjun jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT tsaichungjui jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes AT zhangshuanglin jointanalysisoftwomicroarraygeneexpressiondatasetstoselectlungadenocarcinomamarkergenes |