Cargando…

A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences

As a branch of machine learning, multiple instance learning (MIL) learns from a collection of labeled bags, each containing a set of instances. The learning process is weakly supervised due to ambiguous instance labels. Since its emergence, MIL has been applied to solve various problems including co...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiong, Danyi, Zhang, Ze, Wang, Tao, Wang, Xinlei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2021
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192570/ https://www.ncbi.nlm.nih.gov/pubmed/34141144 http://dx.doi.org/10.1016/j.csbj.2021.05.038

_version_	1783706075769339904
author	Xiong, Danyi Zhang, Ze Wang, Tao Wang, Xinlei
author_facet	Xiong, Danyi Zhang, Ze Wang, Tao Wang, Xinlei
author_sort	Xiong, Danyi
collection	PubMed
description	As a branch of machine learning, multiple instance learning (MIL) learns from a collection of labeled bags, each containing a set of instances. The learning process is weakly supervised due to ambiguous instance labels. Since its emergence, MIL has been applied to solve various problems including content-based image retrieval, object tracking/detection, and computer-aided diagnosis. In biomedical research, the use of MIL has been focused on medical image analysis and molecule activity prediction. We review and apply 16 methods to investigate the applicability of MIL to a novel biomedical application, cancer detection using T-cell receptor (TCR) sequences. This important application can be a viable approach for large-scale cancer screening, as TCRs can be easily profiled from a subject’s peripheral blood. We consider two feasible data-generating mechanisms, and for the purpose of performance evaluation, we simulate data under each mechanism, where we vary potentially important factors to mimic realistic situations. We also apply the methods to sequencing data of ten cancer types from The Cancer Genome Atlas, as an early proof of concept for distinguishing tumor patients from healthy individuals via TCR sequencing of peripheral blood. We find that given an appropriate MIL method is used, satisfactory performance with Area Under the Receiver Operating Characteristic Curve above 80% can be achieved for five in the ten cancers. Based on our numerical results, we make suggestions about selection of a proper method and avoidance of any method with poor performance. We further point out directions of future research as well as identify a pressing need of new MIL methodologies for improved performance (for some cancer types) and more explainable outcomes.
format	Online Article Text
id	pubmed-8192570
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-81925702021-06-16 A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences Xiong, Danyi Zhang, Ze Wang, Tao Wang, Xinlei Comput Struct Biotechnol J Review As a branch of machine learning, multiple instance learning (MIL) learns from a collection of labeled bags, each containing a set of instances. The learning process is weakly supervised due to ambiguous instance labels. Since its emergence, MIL has been applied to solve various problems including content-based image retrieval, object tracking/detection, and computer-aided diagnosis. In biomedical research, the use of MIL has been focused on medical image analysis and molecule activity prediction. We review and apply 16 methods to investigate the applicability of MIL to a novel biomedical application, cancer detection using T-cell receptor (TCR) sequences. This important application can be a viable approach for large-scale cancer screening, as TCRs can be easily profiled from a subject’s peripheral blood. We consider two feasible data-generating mechanisms, and for the purpose of performance evaluation, we simulate data under each mechanism, where we vary potentially important factors to mimic realistic situations. We also apply the methods to sequencing data of ten cancer types from The Cancer Genome Atlas, as an early proof of concept for distinguishing tumor patients from healthy individuals via TCR sequencing of peripheral blood. We find that given an appropriate MIL method is used, satisfactory performance with Area Under the Receiver Operating Characteristic Curve above 80% can be achieved for five in the ten cancers. Based on our numerical results, we make suggestions about selection of a proper method and avoidance of any method with poor performance. We further point out directions of future research as well as identify a pressing need of new MIL methodologies for improved performance (for some cancer types) and more explainable outcomes. Research Network of Computational and Structural Biotechnology 2021-05-24 /pmc/articles/PMC8192570/ /pubmed/34141144 http://dx.doi.org/10.1016/j.csbj.2021.05.038 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Review Xiong, Danyi Zhang, Ze Wang, Tao Wang, Xinlei A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title	A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title_full	A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title_fullStr	A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title_full_unstemmed	A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title_short	A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences
title_sort	comparative study of multiple instance learning methods for cancer detection using t-cell receptor sequences
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8192570/ https://www.ncbi.nlm.nih.gov/pubmed/34141144 http://dx.doi.org/10.1016/j.csbj.2021.05.038
work_keys_str_mv	AT xiongdanyi acomparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT zhangze acomparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT wangtao acomparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT wangxinlei acomparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT xiongdanyi comparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT zhangze comparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT wangtao comparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences AT wangxinlei comparativestudyofmultipleinstancelearningmethodsforcancerdetectionusingtcellreceptorsequences

A comparative study of multiple instance learning methods for cancer detection using T-cell receptor sequences

Ejemplares similares