Cargando…

Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test

Currently, statistical techniques for analysis of microarray-generated data sets have deficiencies due to limited understanding of errors inherent in the data. A generalized likelihood ratio (GLR) test based on an error model has been recently proposed to identify differentially expressed genes from...

Descripción completa

Detalles Bibliográficos
Autores principales: Seng, Kok-Yong, Glenny, Robb W., Madtes, David K., Spilker, Mary E., Vicini, Paolo, Gharib, Sina A.
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613008/
https://www.ncbi.nlm.nih.gov/pubmed/19119428
_version_ 1782163151229288448
author Seng, Kok-Yong
Glenny, Robb W.
Madtes, David K.
Spilker, Mary E.
Vicini, Paolo
Gharib, Sina A.
author_facet Seng, Kok-Yong
Glenny, Robb W.
Madtes, David K.
Spilker, Mary E.
Vicini, Paolo
Gharib, Sina A.
author_sort Seng, Kok-Yong
collection PubMed
description Currently, statistical techniques for analysis of microarray-generated data sets have deficiencies due to limited understanding of errors inherent in the data. A generalized likelihood ratio (GLR) test based on an error model has been recently proposed to identify differentially expressed genes from microarray experiments. However, the use of different error structures under the GLR test has not been evaluated, nor has this method been compared to commonly used statistical tests such as the parametric t-test. The concomitant effects of varying data signal-to-noise ratio and replication number on the performance of statistical tests also remain largely unexplored. In this study, we compared the effects of different underlying statistical error structures on the GLR test’s power in identifying differentially expressed genes in microarray data. We evaluated such variants of the GLR test as well as the one sample t-test based on simulated data by means of receiver operating characteristic (ROC) curves. Further, we used bootstrapping of ROC curves to assess statistical significance of differences between the areas under the curves. Our results showed that i) the GLR tests outperformed the t-test for detecting differential gene expression, ii) the identity of the underlying error structure was important in determining the GLR tests’ performance, and iii) signal-to-noise ratio was a more important contributor than sample replication in identifying statistically significant differential gene expression.
format Text
id pubmed-2613008
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26130082008-12-31 Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test Seng, Kok-Yong Glenny, Robb W. Madtes, David K. Spilker, Mary E. Vicini, Paolo Gharib, Sina A. Gene Regul Syst Bio Original Research Currently, statistical techniques for analysis of microarray-generated data sets have deficiencies due to limited understanding of errors inherent in the data. A generalized likelihood ratio (GLR) test based on an error model has been recently proposed to identify differentially expressed genes from microarray experiments. However, the use of different error structures under the GLR test has not been evaluated, nor has this method been compared to commonly used statistical tests such as the parametric t-test. The concomitant effects of varying data signal-to-noise ratio and replication number on the performance of statistical tests also remain largely unexplored. In this study, we compared the effects of different underlying statistical error structures on the GLR test’s power in identifying differentially expressed genes in microarray data. We evaluated such variants of the GLR test as well as the one sample t-test based on simulated data by means of receiver operating characteristic (ROC) curves. Further, we used bootstrapping of ROC curves to assess statistical significance of differences between the areas under the curves. Our results showed that i) the GLR tests outperformed the t-test for detecting differential gene expression, ii) the identity of the underlying error structure was important in determining the GLR tests’ performance, and iii) signal-to-noise ratio was a more important contributor than sample replication in identifying statistically significant differential gene expression. Libertas Academica 2008-03-17 /pmc/articles/PMC2613008/ /pubmed/19119428 Text en © 2008 by the authors http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Original Research
Seng, Kok-Yong
Glenny, Robb W.
Madtes, David K.
Spilker, Mary E.
Vicini, Paolo
Gharib, Sina A.
Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title_full Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title_fullStr Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title_full_unstemmed Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title_short Comparison of Statistical Data Models for Identifying Differentially Expressed Genes Using a Generalized Likelihood Ratio Test
title_sort comparison of statistical data models for identifying differentially expressed genes using a generalized likelihood ratio test
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2613008/
https://www.ncbi.nlm.nih.gov/pubmed/19119428
work_keys_str_mv AT sengkokyong comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest
AT glennyrobbw comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest
AT madtesdavidk comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest
AT spilkermarye comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest
AT vicinipaolo comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest
AT gharibsinaa comparisonofstatisticaldatamodelsforidentifyingdifferentiallyexpressedgenesusingageneralizedlikelihoodratiotest