Cargando…

Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data

The T and B cell repertoire make up the adaptive immune system and is mainly generated through somatic V(D)J gene recombination. Thus, the VJ gene usage may be a potential prognostic or predictive biomarker. However, analysis of the adaptive immune system is challenging due to the heterogeneity of t...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Tao, Baik, Jason Min, Kato, Chiemi, Yang, Hai, Fan, Zenghua, Cham, Jason, Zhang, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086194/
https://www.ncbi.nlm.nih.gov/pubmed/35559031
http://dx.doi.org/10.3389/fgene.2022.821832
_version_ 1784703944225718272
author He, Tao
Baik, Jason Min
Kato, Chiemi
Yang, Hai
Fan, Zenghua
Cham, Jason
Zhang, Li
author_facet He, Tao
Baik, Jason Min
Kato, Chiemi
Yang, Hai
Fan, Zenghua
Cham, Jason
Zhang, Li
author_sort He, Tao
collection PubMed
description The T and B cell repertoire make up the adaptive immune system and is mainly generated through somatic V(D)J gene recombination. Thus, the VJ gene usage may be a potential prognostic or predictive biomarker. However, analysis of the adaptive immune system is challenging due to the heterogeneity of the clonotypes that make up the repertoire. To address the heterogeneity of the T and B cell repertoire, we proposed a novel ensemble feature selection approach and customized statistical learning algorithm focusing on the VJ gene usage. We applied the proposed approach to T cell receptor sequences from recovered COVID-19 patients and healthy donors, as well as a group of lung cancer patients who received immunotherapy. Our approach identified distinct VJ genes used in the COVID-19 recovered patients comparing to the healthy donors and the VJ genes associated with the clinical response in the lung cancer patients. Simulation studies show that the ensemble feature selection approach outperformed other state-of-the-art feature selection methods based on both efficiency and accuracy. It consistently yielded higher stability and sensitivity with lower false discovery rates. When integrated with different classification methods, the ensemble feature selection approach had the best prediction accuracy. In conclusion, the proposed novel approach and the integration procedure is an effective feature selection technique to aid in correctly classifying different subtypes to better understand the signatures in the adaptive immune response associated with disease or the treatment in order to improve treatment strategies.
format Online
Article
Text
id pubmed-9086194
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-90861942022-05-11 Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data He, Tao Baik, Jason Min Kato, Chiemi Yang, Hai Fan, Zenghua Cham, Jason Zhang, Li Front Genet Genetics The T and B cell repertoire make up the adaptive immune system and is mainly generated through somatic V(D)J gene recombination. Thus, the VJ gene usage may be a potential prognostic or predictive biomarker. However, analysis of the adaptive immune system is challenging due to the heterogeneity of the clonotypes that make up the repertoire. To address the heterogeneity of the T and B cell repertoire, we proposed a novel ensemble feature selection approach and customized statistical learning algorithm focusing on the VJ gene usage. We applied the proposed approach to T cell receptor sequences from recovered COVID-19 patients and healthy donors, as well as a group of lung cancer patients who received immunotherapy. Our approach identified distinct VJ genes used in the COVID-19 recovered patients comparing to the healthy donors and the VJ genes associated with the clinical response in the lung cancer patients. Simulation studies show that the ensemble feature selection approach outperformed other state-of-the-art feature selection methods based on both efficiency and accuracy. It consistently yielded higher stability and sensitivity with lower false discovery rates. When integrated with different classification methods, the ensemble feature selection approach had the best prediction accuracy. In conclusion, the proposed novel approach and the integration procedure is an effective feature selection technique to aid in correctly classifying different subtypes to better understand the signatures in the adaptive immune response associated with disease or the treatment in order to improve treatment strategies. Frontiers Media S.A. 2022-04-26 /pmc/articles/PMC9086194/ /pubmed/35559031 http://dx.doi.org/10.3389/fgene.2022.821832 Text en Copyright © 2022 He, Baik, Kato, Yang, Fan, Cham and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
He, Tao
Baik, Jason Min
Kato, Chiemi
Yang, Hai
Fan, Zenghua
Cham, Jason
Zhang, Li
Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title_full Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title_fullStr Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title_full_unstemmed Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title_short Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
title_sort novel ensemble feature selection approach and application in repertoire sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086194/
https://www.ncbi.nlm.nih.gov/pubmed/35559031
http://dx.doi.org/10.3389/fgene.2022.821832
work_keys_str_mv AT hetao novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT baikjasonmin novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT katochiemi novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT yanghai novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT fanzenghua novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT chamjason novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata
AT zhangli novelensemblefeatureselectionapproachandapplicationinrepertoiresequencingdata