Cargando…

Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data

Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistica...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Yuande, Liu, Yin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280440/
https://www.ncbi.nlm.nih.gov/pubmed/22347782
_version_ 1782223827582844928
author Tan, Yuande
Liu, Yin
author_facet Tan, Yuande
Liu, Yin
author_sort Tan, Yuande
collection PubMed
description Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large.
format Online
Article
Text
id pubmed-3280440
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-32804402012-02-17 Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data Tan, Yuande Liu, Yin Bioinformation Hypothesis Identification of genes differentially expressed across multiple conditions has become an important statistical problem in analyzing large-scale microarray data. Many statistical methods have been developed to address the challenging problem. Therefore, an extensive comparison among these statistical methods is extremely important for experimental scientists to choose a valid method for their data analysis. In this study, we conducted simulation studies to compare six statistical methods: the Bonferroni (B-) procedure, the Benjamini and Hochberg (BH-) procedure, the Local false discovery rate (Localfdr) method, the Optimal Discovery Procedure (ODP), the Ranking Analysis of F-statistics (RAF), and the Significant Analysis of Microarray data (SAM) in identifying differentially expressed genes. We demonstrated that the strength of treatment effect, the sample size, proportion of differentially expressed genes and variance of gene expression will significantly affect the performance of different methods. The simulated results show that ODP exhibits an extremely high power in indentifying differentially expressed genes, but significantly underestimates the False Discovery Rate (FDR) in all different data scenarios. The SAM has poor performance when the sample size is small, but is among the best-performing methods when the sample size is large. The B-procedure is stringent and thus has a low power in all data scenarios. Localfdr and RAF show comparable statistical behaviors with the BH-procedure with favorable power and conservativeness of FDR estimation. RAF performs the best when proportion of differentially expressed genes is small and treatment effect is weak, but Localfdr is better than RAF when proportion of differentially expressed genes is large. Biomedical Informatics 2011-12-21 /pmc/articles/PMC3280440/ /pubmed/22347782 Text en © 2011 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Hypothesis
Tan, Yuande
Liu, Yin
Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title_full Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title_fullStr Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title_full_unstemmed Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title_short Comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
title_sort comparison of methods for identifying differentially expressed genes across multiple conditions from microarray data
topic Hypothesis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280440/
https://www.ncbi.nlm.nih.gov/pubmed/22347782
work_keys_str_mv AT tanyuande comparisonofmethodsforidentifyingdifferentiallyexpressedgenesacrossmultipleconditionsfrommicroarraydata
AT liuyin comparisonofmethodsforidentifyingdifferentiallyexpressedgenesacrossmultipleconditionsfrommicroarraydata