Cargando…

Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to impro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Watanabe, Yuto, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/ https://www.ncbi.nlm.nih.gov/pubmed/38005673 http://dx.doi.org/10.3390/s23229287

_version_	1785149771207409664
author	Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki
author_facet	Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki
author_sort	Watanabe, Yuto
collection	PubMed
description	At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics.
format	Online Article Text
id	pubmed-10675000
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106750002023-11-20 Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Sensors (Basel) Article At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. MDPI 2023-11-20 /pmc/articles/PMC10675000/ /pubmed/38005673 http://dx.doi.org/10.3390/s23229287 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title	Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_full	Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_fullStr	Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_full_unstemmed	Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_short	Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_sort	manipulation direction: evaluating text-guided image manipulation based on similarity between changes in image and text modalities
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/ https://www.ncbi.nlm.nih.gov/pubmed/38005673 http://dx.doi.org/10.3390/s23229287
work_keys_str_mv	AT watanabeyuto manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT togoren manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT maedakeisuke manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT ogawatakahiro manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT haseyamamiki manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities

Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities

Ejemplares similares