Cargando…
A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categ...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000433/ https://www.ncbi.nlm.nih.gov/pubmed/24731138 http://dx.doi.org/10.1186/1471-2105-15-108 |
_version_ | 1782313620280967168 |
---|---|
author | Li, Yihan Ghosh, Debashis |
author_facet | Li, Yihan Ghosh, Debashis |
author_sort | Li, Yihan |
collection | PubMed |
description | BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. RESULTS: We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. CONCLUSIONS: The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. |
format | Online Article Text |
id | pubmed-4000433 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40004332014-05-08 A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories Li, Yihan Ghosh, Debashis BMC Bioinformatics Methodology Article BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. RESULTS: We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. CONCLUSIONS: The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. BioMed Central 2014-04-14 /pmc/articles/PMC4000433/ /pubmed/24731138 http://dx.doi.org/10.1186/1471-2105-15-108 Text en Copyright © 2014 Li and Ghosh; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. |
spellingShingle | Methodology Article Li, Yihan Ghosh, Debashis A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title | A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title_full | A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title_fullStr | A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title_full_unstemmed | A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title_short | A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
title_sort | two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000433/ https://www.ncbi.nlm.nih.gov/pubmed/24731138 http://dx.doi.org/10.1186/1471-2105-15-108 |
work_keys_str_mv | AT liyihan atwostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories AT ghoshdebashis atwostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories AT liyihan twostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories AT ghoshdebashis twostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories |