Cargando…

Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities

At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to impro...

Descripción completa

Detalles Bibliográficos
Autores principales: Watanabe, Yuto, Togo, Ren, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/
https://www.ncbi.nlm.nih.gov/pubmed/38005673
http://dx.doi.org/10.3390/s23229287
_version_ 1785149771207409664
author Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
author_facet Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
author_sort Watanabe, Yuto
collection PubMed
description At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics.
format Online
Article
Text
id pubmed-10675000
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106750002023-11-20 Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Sensors (Basel) Article At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. MDPI 2023-11-20 /pmc/articles/PMC10675000/ /pubmed/38005673 http://dx.doi.org/10.3390/s23229287 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Watanabe, Yuto
Togo, Ren
Maeda, Keisuke
Ogawa, Takahiro
Haseyama, Miki
Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_full Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_fullStr Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_full_unstemmed Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_short Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
title_sort manipulation direction: evaluating text-guided image manipulation based on similarity between changes in image and text modalities
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/
https://www.ncbi.nlm.nih.gov/pubmed/38005673
http://dx.doi.org/10.3390/s23229287
work_keys_str_mv AT watanabeyuto manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities
AT togoren manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities
AT maedakeisuke manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities
AT ogawatakahiro manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities
AT haseyamamiki manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities