Cargando…
Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities
At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to impro...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/ https://www.ncbi.nlm.nih.gov/pubmed/38005673 http://dx.doi.org/10.3390/s23229287 |
_version_ | 1785149771207409664 |
---|---|
author | Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki |
author_facet | Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki |
author_sort | Watanabe, Yuto |
collection | PubMed |
description | At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. |
format | Online Article Text |
id | pubmed-10675000 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106750002023-11-20 Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Sensors (Basel) Article At present, text-guided image manipulation is a notable subject of study in the vision and language field. Given an image and text as inputs, these methods aim to manipulate the image according to the text, while preserving text-irrelevant regions. Although there has been extensive research to improve the versatility and performance of text-guided image manipulation, research on its performance evaluation is inadequate. This study proposes Manipulation Direction (MD), a logical and robust metric, which evaluates the performance of text-guided image manipulation by focusing on changes between image and text modalities. Specifically, we define MD as the consistency of changes between images and texts occurring before and after manipulation. By using MD to evaluate the performance of text-guided image manipulation, we can comprehensively evaluate how an image has changed before and after the image manipulation and whether this change agrees with the text. Extensive experiments on Multi-Modal-CelebA-HQ and Caltech-UCSD Birds confirmed that there was an impressive correlation between our calculated MD scores and subjective scores for the manipulated images compared to the existing metrics. MDPI 2023-11-20 /pmc/articles/PMC10675000/ /pubmed/38005673 http://dx.doi.org/10.3390/s23229287 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Watanabe, Yuto Togo, Ren Maeda, Keisuke Ogawa, Takahiro Haseyama, Miki Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title | Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title_full | Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title_fullStr | Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title_full_unstemmed | Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title_short | Manipulation Direction: Evaluating Text-Guided Image Manipulation Based on Similarity between Changes in Image and Text Modalities |
title_sort | manipulation direction: evaluating text-guided image manipulation based on similarity between changes in image and text modalities |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675000/ https://www.ncbi.nlm.nih.gov/pubmed/38005673 http://dx.doi.org/10.3390/s23229287 |
work_keys_str_mv | AT watanabeyuto manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT togoren manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT maedakeisuke manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT ogawatakahiro manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities AT haseyamamiki manipulationdirectionevaluatingtextguidedimagemanipulationbasedonsimilaritybetweenchangesinimageandtextmodalities |