Cargando…
Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitati...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8059024/ https://www.ncbi.nlm.nih.gov/pubmed/33882829 http://dx.doi.org/10.1186/s12859-021-04110-x |
_version_ | 1783681127591968768 |
---|---|
author | Cao, Xueyuan Pounds, Stan |
author_facet | Cao, Xueyuan Pounds, Stan |
author_sort | Cao, Xueyuan |
collection | PubMed |
description | BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. RESULTS: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. CONCLUSION: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04110-x. |
format | Online Article Text |
id | pubmed-8059024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80590242021-04-21 Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis Cao, Xueyuan Pounds, Stan BMC Bioinformatics Methodology Article BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. RESULTS: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. CONCLUSION: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04110-x. BioMed Central 2021-04-21 /pmc/articles/PMC8059024/ /pubmed/33882829 http://dx.doi.org/10.1186/s12859-021-04110-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Cao, Xueyuan Pounds, Stan Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title | Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title_full | Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title_fullStr | Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title_full_unstemmed | Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title_short | Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis |
title_sort | gene-set distance analysis (gsda): a powerful tool for gene-set association analysis |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8059024/ https://www.ncbi.nlm.nih.gov/pubmed/33882829 http://dx.doi.org/10.1186/s12859-021-04110-x |
work_keys_str_mv | AT caoxueyuan genesetdistanceanalysisgsdaapowerfultoolforgenesetassociationanalysis AT poundsstan genesetdistanceanalysisgsdaapowerfultoolforgenesetassociationanalysis |