Cargando…

Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures

RATIONALE: Many blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Xutao, VanValkenberg, Arthur, Odom-Mabey, Aubrey R., Ellner, Jerrold J., Hochberg, Natasha S., Salgame, Padmini, Patil, Prasad, Johnson, W. Evan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882404/ https://www.ncbi.nlm.nih.gov/pubmed/36711818 http://dx.doi.org/10.1101/2023.01.19.520627

_version_	1784879287666475008
author	Wang, Xutao VanValkenberg, Arthur Odom-Mabey, Aubrey R. Ellner, Jerrold J. Hochberg, Natasha S. Salgame, Padmini Patil, Prasad Johnson, W. Evan
author_facet	Wang, Xutao VanValkenberg, Arthur Odom-Mabey, Aubrey R. Ellner, Jerrold J. Hochberg, Natasha S. Salgame, Padmini Patil, Prasad Johnson, W. Evan
author_sort	Wang, Xutao
collection	PubMed
description	RATIONALE: Many blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis (GSEA) of the signature transcripts alone is sufficient for prediction and differentiation, or whether it is necessary to use the original statistical model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data, missing details about the original trained model, and inadequate publicly-available software tools or source code implementing models. To facilitate these signatures’ replicability and appropriate utilization in TB research, comprehensive comparisons between gene set scoring methods with cross-data validation of original model implementations are needed. OBJECTIVES: We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both re-rebuilt original models and gene set scoring methods to evaluate whether gene set scoring is a reasonable proxy to the performance of the original trained model. We have provided an open-access software implementation of the original models for all 19 signatures for future use. METHODS: We considered existing gene set scoring and machine learning methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, as alternative approaches to profile gene signature performance. The sample-size-weighted mean area under the curve (AUC) value was computed to measure each signature’s performance across datasets. Correlation analysis and Wilcoxon paired tests were used to analyze the performance of enrichment methods with the original models. MEASUREMENT AND MAIN RESULTS: For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original diagnostic models. PLAGE outperformed all other gene scoring methods. In some cases, PLAGE outperformed the original models when considering signatures’ weighted mean AUC values and the AUC results within individual studies. CONCLUSION: Gene set enrichment scoring of existing blood-based biomarker gene sets can distinguish patients with active TB disease from latent TB infection and other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.
format	Online Article Text
id	pubmed-9882404
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-98824042023-01-28 Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures Wang, Xutao VanValkenberg, Arthur Odom-Mabey, Aubrey R. Ellner, Jerrold J. Hochberg, Natasha S. Salgame, Padmini Patil, Prasad Johnson, W. Evan bioRxiv Article RATIONALE: Many blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease, predict risk of progression from infection to disease, and monitor TB treatment outcomes. However, an unresolved issue is whether gene set enrichment analysis (GSEA) of the signature transcripts alone is sufficient for prediction and differentiation, or whether it is necessary to use the original statistical model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data, missing details about the original trained model, and inadequate publicly-available software tools or source code implementing models. To facilitate these signatures’ replicability and appropriate utilization in TB research, comprehensive comparisons between gene set scoring methods with cross-data validation of original model implementations are needed. OBJECTIVES: We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both re-rebuilt original models and gene set scoring methods to evaluate whether gene set scoring is a reasonable proxy to the performance of the original trained model. We have provided an open-access software implementation of the original models for all 19 signatures for future use. METHODS: We considered existing gene set scoring and machine learning methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, as alternative approaches to profile gene signature performance. The sample-size-weighted mean area under the curve (AUC) value was computed to measure each signature’s performance across datasets. Correlation analysis and Wilcoxon paired tests were used to analyze the performance of enrichment methods with the original models. MEASUREMENT AND MAIN RESULTS: For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original diagnostic models. PLAGE outperformed all other gene scoring methods. In some cases, PLAGE outperformed the original models when considering signatures’ weighted mean AUC values and the AUC results within individual studies. CONCLUSION: Gene set enrichment scoring of existing blood-based biomarker gene sets can distinguish patients with active TB disease from latent TB infection and other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement. Cold Spring Harbor Laboratory 2023-01-30 /pmc/articles/PMC9882404/ /pubmed/36711818 http://dx.doi.org/10.1101/2023.01.19.520627 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Wang, Xutao VanValkenberg, Arthur Odom-Mabey, Aubrey R. Ellner, Jerrold J. Hochberg, Natasha S. Salgame, Padmini Patil, Prasad Johnson, W. Evan Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title	Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title_full	Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title_fullStr	Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title_full_unstemmed	Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title_short	Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
title_sort	comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882404/ https://www.ncbi.nlm.nih.gov/pubmed/36711818 http://dx.doi.org/10.1101/2023.01.19.520627
work_keys_str_mv	AT wangxutao comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT vanvalkenbergarthur comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT odommabeyaubreyr comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT ellnerjerroldj comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT hochbergnatashas comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT salgamepadmini comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT patilprasad comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures AT johnsonwevan comparisonofgenesetscoringmethodsforreproducibleevaluationofmultipletuberculosisgenesignatures

Comparison of gene set scoring methods for reproducible evaluation of multiple tuberculosis gene signatures

Ejemplares similares