Cargando…

Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis

BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitati...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Xueyuan, Pounds, Stan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8059024/
https://www.ncbi.nlm.nih.gov/pubmed/33882829
http://dx.doi.org/10.1186/s12859-021-04110-x
_version_ 1783681127591968768
author Cao, Xueyuan
Pounds, Stan
author_facet Cao, Xueyuan
Pounds, Stan
author_sort Cao, Xueyuan
collection PubMed
description BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. RESULTS: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. CONCLUSION: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04110-x.
format Online
Article
Text
id pubmed-8059024
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80590242021-04-21 Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis Cao, Xueyuan Pounds, Stan BMC Bioinformatics Methodology Article BACKGROUND: Identifying sets of related genes (gene sets) that are empirically associated with a treatment or phenotype often yields valuable biological insights. Several methods effectively identify gene sets in which individual genes have simple monotonic relationships with categorical, quantitative, or censored event-time variables. Some distance-based methods, such as distance correlations, may detect complex non-monotone associations of a gene-set with a quantitative variable that elude other methods. However, the distance correlations have yet to be generalized to associate gene-sets with categorical and censored event-time endpoints. Also, there is a need to determine which genes empirically drive the significance of an association of a gene set with an endpoint. RESULTS: We develop gene-set distance analysis (GSDA) by generalizing distance correlations to evaluate the association of a gene set with categorical and censored event-time variables. We also develop a backward elimination procedure to identify a subset of genes that empirically drive significant associations. In simulation studies, GSDA more effectively identified complex non-monotone gene-set associations than did six other published methods. In the analysis of a pediatric acute myeloid leukemia (AML) data set, GSDA was the only method to discover that event-free survival (EFS) was associated with the 56-gene AML pathway gene-set, narrow that result down to 5 genes, and confirm the association of those 5 genes with EFS in a separate validation cohort. These results indicate that GSDA effectively identifies and characterizes complex non-monotonic gene-set associations that are missed by other methods. CONCLUSION: GSDA is a powerful and flexible method to detect gene-set association with categorical, quantitative, or censored event-time variables, especially to detect complex non-monotonic gene-set associations. Available at https://CRAN.R-project.org/package=GSDA. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04110-x. BioMed Central 2021-04-21 /pmc/articles/PMC8059024/ /pubmed/33882829 http://dx.doi.org/10.1186/s12859-021-04110-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Cao, Xueyuan
Pounds, Stan
Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title_full Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title_fullStr Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title_full_unstemmed Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title_short Gene-set distance analysis (GSDA): a powerful tool for gene-set association analysis
title_sort gene-set distance analysis (gsda): a powerful tool for gene-set association analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8059024/
https://www.ncbi.nlm.nih.gov/pubmed/33882829
http://dx.doi.org/10.1186/s12859-021-04110-x
work_keys_str_mv AT caoxueyuan genesetdistanceanalysisgsdaapowerfultoolforgenesetassociationanalysis
AT poundsstan genesetdistanceanalysisgsdaapowerfultoolforgenesetassociationanalysis