Cargando…

Statistical aspects of omics data analysis using the random compound covariate

BACKGROUND: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Su, Pei-Fang, Chen, Xi, Chen, Heidi, Shyr, Yu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524312/ https://www.ncbi.nlm.nih.gov/pubmed/23281681 http://dx.doi.org/10.1186/1752-0509-6-S3-S11

_version_	1782253310642749440
author	Su, Pei-Fang Chen, Xi Chen, Heidi Shyr, Yu
author_facet	Su, Pei-Fang Chen, Xi Chen, Heidi Shyr, Yu
author_sort	Su, Pei-Fang
collection	PubMed
description	BACKGROUND: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set. RESULTS: We correct this flaw in the analysis and we also propose treating the compound score as a random covariate, to achieve more appropriate results and significantly improve study power for survival outcomes. With this proposed method, we thoroughly assess the performance of two commonly used estimated gene weights through simulation studies. When the sample size is 100, and censoring rates are 50%, 30%, and 10%, power is increased by 10.6%, 3.5%, and 0.4%, respectively, by treating the compound score as a random covariate rather than a fixed covariate. Finally, we assess our proposed method using two publicly available microarray data sets. CONCLUSION: In this article, we correct this flaw in the analysis and the propose method, treating the compound score as a random covariate, can achieve more appropriate results and improve study power for survival outcomes.
format	Online Article Text
id	pubmed-3524312
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35243122012-12-21 Statistical aspects of omics data analysis using the random compound covariate Su, Pei-Fang Chen, Xi Chen, Heidi Shyr, Yu BMC Syst Biol Research BACKGROUND: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set. RESULTS: We correct this flaw in the analysis and we also propose treating the compound score as a random covariate, to achieve more appropriate results and significantly improve study power for survival outcomes. With this proposed method, we thoroughly assess the performance of two commonly used estimated gene weights through simulation studies. When the sample size is 100, and censoring rates are 50%, 30%, and 10%, power is increased by 10.6%, 3.5%, and 0.4%, respectively, by treating the compound score as a random covariate rather than a fixed covariate. Finally, we assess our proposed method using two publicly available microarray data sets. CONCLUSION: In this article, we correct this flaw in the analysis and the propose method, treating the compound score as a random covariate, can achieve more appropriate results and improve study power for survival outcomes. BioMed Central 2012-12-17 /pmc/articles/PMC3524312/ /pubmed/23281681 http://dx.doi.org/10.1186/1752-0509-6-S3-S11 Text en Copyright ©2012 Su et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Su, Pei-Fang Chen, Xi Chen, Heidi Shyr, Yu Statistical aspects of omics data analysis using the random compound covariate
title	Statistical aspects of omics data analysis using the random compound covariate
title_full	Statistical aspects of omics data analysis using the random compound covariate
title_fullStr	Statistical aspects of omics data analysis using the random compound covariate
title_full_unstemmed	Statistical aspects of omics data analysis using the random compound covariate
title_short	Statistical aspects of omics data analysis using the random compound covariate
title_sort	statistical aspects of omics data analysis using the random compound covariate
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3524312/ https://www.ncbi.nlm.nih.gov/pubmed/23281681 http://dx.doi.org/10.1186/1752-0509-6-S3-S11
work_keys_str_mv	AT supeifang statisticalaspectsofomicsdataanalysisusingtherandomcompoundcovariate AT chenxi statisticalaspectsofomicsdataanalysisusingtherandomcompoundcovariate AT chenheidi statisticalaspectsofomicsdataanalysisusingtherandomcompoundcovariate AT shyryu statisticalaspectsofomicsdataanalysisusingtherandomcompoundcovariate

Statistical aspects of omics data analysis using the random compound covariate

Ejemplares similares