Cargando…

A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categ...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yihan, Ghosh, Debashis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000433/
https://www.ncbi.nlm.nih.gov/pubmed/24731138
http://dx.doi.org/10.1186/1471-2105-15-108
_version_ 1782313620280967168
author Li, Yihan
Ghosh, Debashis
author_facet Li, Yihan
Ghosh, Debashis
author_sort Li, Yihan
collection PubMed
description BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. RESULTS: We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. CONCLUSIONS: The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods.
format Online
Article
Text
id pubmed-4000433
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40004332014-05-08 A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories Li, Yihan Ghosh, Debashis BMC Bioinformatics Methodology Article BACKGROUND: In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems. RESULTS: We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations. CONCLUSIONS: The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods. BioMed Central 2014-04-14 /pmc/articles/PMC4000433/ /pubmed/24731138 http://dx.doi.org/10.1186/1471-2105-15-108 Text en Copyright © 2014 Li and Ghosh; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Methodology Article
Li, Yihan
Ghosh, Debashis
A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title_full A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title_fullStr A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title_full_unstemmed A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title_short A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
title_sort two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000433/
https://www.ncbi.nlm.nih.gov/pubmed/24731138
http://dx.doi.org/10.1186/1471-2105-15-108
work_keys_str_mv AT liyihan atwostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories
AT ghoshdebashis atwostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories
AT liyihan twostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories
AT ghoshdebashis twostephierarchicalhypothesissettestingframeworkwithapplicationstogeneexpressiondataonorderedcategories