Cargando…

A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells

BACKGROUND: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Bin, Lindner, Patrick, Jirmo, Adan Chari, Maus, Ulrich, Illig, Thomas, DeLuca, David S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986093/
https://www.ncbi.nlm.nih.gov/pubmed/31992182
http://dx.doi.org/10.1186/s12859-020-3366-4
_version_ 1783491914427793408
author Liu, Bin
Lindner, Patrick
Jirmo, Adan Chari
Maus, Ulrich
Illig, Thomas
DeLuca, David S.
author_facet Liu, Bin
Lindner, Patrick
Jirmo, Adan Chari
Maus, Ulrich
Illig, Thomas
DeLuca, David S.
author_sort Liu, Bin
collection PubMed
description BACKGROUND: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, “gene signatures” for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. RESULTS: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. CONCLUSIONS: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives.
format Online
Article
Text
id pubmed-6986093
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69860932020-01-30 A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells Liu, Bin Lindner, Patrick Jirmo, Adan Chari Maus, Ulrich Illig, Thomas DeLuca, David S. BMC Bioinformatics Methodology Article BACKGROUND: Despite the significant contribution of transcriptomics to the fields of biological and biomedical research, interpreting long lists of significantly differentially expressed genes remains a challenging step in the analysis process. Gene set enrichment analysis is a standard approach for summarizing differentially expressed genes into pathways or other gene groupings. Here, we explore an alternative approach to utilizing gene sets from curated databases. We examine the method of deriving custom gene sets which may be relevant to a given experiment using reference data sets from previous transcriptomics studies. We call these data-derived gene sets, “gene signatures” for the biological process tested in the previous study. We focus on the feasibility of this approach in analyzing immune-related processes, which are complicated in their nature but play an important role in the medical research. RESULTS: We evaluate several statistical approaches to detecting the activity of a gene signature in a target data set. We compare the performance of the data-derived gene signature approach with comparable GO term gene sets across all of the statistical tests. A total of 61 differential expression comparisons generated from 26 transcriptome experiments were included in the analysis. These experiments covered eight immunological processes in eight types of leukocytes. The data-derived signatures were used to detect the presence of immunological processes in the test data with modest accuracy (AUC = 0.67). The performance for GO and literature based gene sets was worse (AUC = 0.59). Both approaches were plagued by poor specificity. CONCLUSIONS: When investigators seek to test specific hypotheses, the data-derived signature approach can perform as well, if not better than standard gene-set based approaches for immunological signatures. Furthermore, the data-derived signatures can be generated in the cases that well-defined gene sets are lacking from pathway databases and also offer the opportunity for defining signatures in a cell-type specific manner. However, neither the data-derived signatures nor standard gene-sets can be demonstrated to reliably provide negative predictions for negative cases. We conclude that the data-derived signature approach is a useful and sometimes necessary tool, but analysts should be weary of false positives. BioMed Central 2020-01-28 /pmc/articles/PMC6986093/ /pubmed/31992182 http://dx.doi.org/10.1186/s12859-020-3366-4 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Liu, Bin
Lindner, Patrick
Jirmo, Adan Chari
Maus, Ulrich
Illig, Thomas
DeLuca, David S.
A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title_full A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title_fullStr A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title_full_unstemmed A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title_short A comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
title_sort comparison of curated gene sets versus transcriptomics-derived gene signatures for detecting pathway activation in immune cells
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986093/
https://www.ncbi.nlm.nih.gov/pubmed/31992182
http://dx.doi.org/10.1186/s12859-020-3366-4
work_keys_str_mv AT liubin acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT lindnerpatrick acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT jirmoadanchari acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT mausulrich acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT illigthomas acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT delucadavids acomparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT liubin comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT lindnerpatrick comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT jirmoadanchari comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT mausulrich comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT illigthomas comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells
AT delucadavids comparisonofcuratedgenesetsversustranscriptomicsderivedgenesignaturesfordetectingpathwayactivationinimmunecells