Cargando…

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

BACKGROUND: The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Charoenkwan, Phasit, Schaduangrat, Nalini, Shoombuatong, Watshara
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386778/ https://www.ncbi.nlm.nih.gov/pubmed/37507654 http://dx.doi.org/10.1186/s12859-023-05421-x

_version_	1785081751699193856
author	Charoenkwan, Phasit Schaduangrat, Nalini Shoombuatong, Watshara
author_facet	Charoenkwan, Phasit Schaduangrat, Nalini Shoombuatong, Watshara
author_sort	Charoenkwan, Phasit
collection	PubMed
description	BACKGROUND: The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS: In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS: In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server (http://2pmlab.camt.cmu.ac.th/StackTTCA) to maximize user convenience for high-throughput screening of novel TTCAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05421-x.
format	Online Article Text
id	pubmed-10386778
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-103867782023-07-30 StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens Charoenkwan, Phasit Schaduangrat, Nalini Shoombuatong, Watshara BMC Bioinformatics Research BACKGROUND: The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS: In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS: In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server (http://2pmlab.camt.cmu.ac.th/StackTTCA) to maximize user convenience for high-throughput screening of novel TTCAs. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05421-x. BioMed Central 2023-07-28 /pmc/articles/PMC10386778/ /pubmed/37507654 http://dx.doi.org/10.1186/s12859-023-05421-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Charoenkwan, Phasit Schaduangrat, Nalini Shoombuatong, Watshara StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title	StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title_full	StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title_fullStr	StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title_full_unstemmed	StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title_short	StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens
title_sort	stackttca: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor t cell antigens
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386778/ https://www.ncbi.nlm.nih.gov/pubmed/37507654 http://dx.doi.org/10.1186/s12859-023-05421-x
work_keys_str_mv	AT charoenkwanphasit stackttcaastackingensemblelearningbasedframeworkforaccurateandhighthroughputidentificationoftumortcellantigens AT schaduangratnalini stackttcaastackingensemblelearningbasedframeworkforaccurateandhighthroughputidentificationoftumortcellantigens AT shoombuatongwatshara stackttcaastackingensemblelearningbasedframeworkforaccurateandhighthroughputidentificationoftumortcellantigens

StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens

Ejemplares similares