Cargando…

CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation

Recent research has shown that visual–text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-lev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Shi-Cheng, Liu, Shang-Kun, Wang, Jing-Yu, Zheng, Wei-Min, Jiang, Cheng-Yu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529322/ https://www.ncbi.nlm.nih.gov/pubmed/37761652 http://dx.doi.org/10.3390/e25091353

_version_	1785111375284011008
author	Guo, Shi-Cheng Liu, Shang-Kun Wang, Jing-Yu Zheng, Wei-Min Jiang, Cheng-Yu
author_facet	Guo, Shi-Cheng Liu, Shang-Kun Wang, Jing-Yu Zheng, Wei-Min Jiang, Cheng-Yu
author_sort	Guo, Shi-Cheng
collection	PubMed
description	Recent research has shown that visual–text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We explore the potential abilities of CLIP in the field of few-shot segmentation. The current mainstream approach is to utilize support and query features to generate class prototypes and then use the prototype features to match image features. We propose a new method that utilizes CLIP to extract text features for a specific class. These text features are then used as training samples to participate in the model’s training process. The addition of text features enables model to extract features that contain richer semantic information, thus making it easier to capture potential class information. To better match the query image features, we also propose a new prototype generation method that incorporates multi-modal fusion features of text and images in the prototype generation process. Adaptive query prototypes were generated by combining foreground and background information from the images with the multi-modal support prototype, thereby allowing for a better matching of image features and improved segmentation accuracy. We provide a new perspective to the task of few-shot segmentation in multi-modal scenarios. Experiments demonstrate that our proposed method achieves excellent results on two common datasets, PASCAL- [Formula: see text] and COCO- [Formula: see text].
format	Online Article Text
id	pubmed-10529322
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-105293222023-09-28 CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation Guo, Shi-Cheng Liu, Shang-Kun Wang, Jing-Yu Zheng, Wei-Min Jiang, Cheng-Yu Entropy (Basel) Article Recent research has shown that visual–text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We explore the potential abilities of CLIP in the field of few-shot segmentation. The current mainstream approach is to utilize support and query features to generate class prototypes and then use the prototype features to match image features. We propose a new method that utilizes CLIP to extract text features for a specific class. These text features are then used as training samples to participate in the model’s training process. The addition of text features enables model to extract features that contain richer semantic information, thus making it easier to capture potential class information. To better match the query image features, we also propose a new prototype generation method that incorporates multi-modal fusion features of text and images in the prototype generation process. Adaptive query prototypes were generated by combining foreground and background information from the images with the multi-modal support prototype, thereby allowing for a better matching of image features and improved segmentation accuracy. We provide a new perspective to the task of few-shot segmentation in multi-modal scenarios. Experiments demonstrate that our proposed method achieves excellent results on two common datasets, PASCAL- [Formula: see text] and COCO- [Formula: see text]. MDPI 2023-09-18 /pmc/articles/PMC10529322/ /pubmed/37761652 http://dx.doi.org/10.3390/e25091353 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Guo, Shi-Cheng Liu, Shang-Kun Wang, Jing-Yu Zheng, Wei-Min Jiang, Cheng-Yu CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title	CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title_full	CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title_fullStr	CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title_full_unstemmed	CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title_short	CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation
title_sort	clip-driven prototype network for few-shot semantic segmentation
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529322/ https://www.ncbi.nlm.nih.gov/pubmed/37761652 http://dx.doi.org/10.3390/e25091353
work_keys_str_mv	AT guoshicheng clipdrivenprototypenetworkforfewshotsemanticsegmentation AT liushangkun clipdrivenprototypenetworkforfewshotsemanticsegmentation AT wangjingyu clipdrivenprototypenetworkforfewshotsemanticsegmentation AT zhengweimin clipdrivenprototypenetworkforfewshotsemanticsegmentation AT jiangchengyu clipdrivenprototypenetworkforfewshotsemanticsegmentation

CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation

Ejemplares similares