Cargando…

Learning a spatial-temporal texture transformer network for video inpainting

We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model t...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Pengsen, Xue, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606320/
https://www.ncbi.nlm.nih.gov/pubmed/36310632
http://dx.doi.org/10.3389/fnbot.2022.1002453
_version_ 1784818270725996544
author Ma, Pengsen
Xue, Tao
author_facet Ma, Pengsen
Xue, Tao
author_sort Ma, Pengsen
collection PubMed
description We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model to extract information and reconstruct the content, resulting in the inability to reconstruct the textures that should be transferred accurately. In this paper, we propose a novel and effective spatial-temporal texture transformer network (STTTN) for video inpainting. STTTN consists of six closely related modules optimized for video inpainting tasks: feature similarity measure for more accurate frame pre-repair, an encoder with strong information extraction ability, embedding module for finding a correlation, coarse low-frequency feature transfer, refinement high-frequency feature transfer, and decoder with accurate content reconstruction ability. Such a design encourages joint feature learning across the input and reference frames. To demonstrate the advancedness and effectiveness of the proposed model, we conduct comprehensive ablation learning and qualitative and quantitative experiments on multiple datasets by using standard stationary masks and more realistic moving object masks. The excellent experimental results demonstrate the authenticity and reliability of the STTTN.
format Online
Article
Text
id pubmed-9606320
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96063202022-10-28 Learning a spatial-temporal texture transformer network for video inpainting Ma, Pengsen Xue, Tao Front Neurorobot Neuroscience We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model to extract information and reconstruct the content, resulting in the inability to reconstruct the textures that should be transferred accurately. In this paper, we propose a novel and effective spatial-temporal texture transformer network (STTTN) for video inpainting. STTTN consists of six closely related modules optimized for video inpainting tasks: feature similarity measure for more accurate frame pre-repair, an encoder with strong information extraction ability, embedding module for finding a correlation, coarse low-frequency feature transfer, refinement high-frequency feature transfer, and decoder with accurate content reconstruction ability. Such a design encourages joint feature learning across the input and reference frames. To demonstrate the advancedness and effectiveness of the proposed model, we conduct comprehensive ablation learning and qualitative and quantitative experiments on multiple datasets by using standard stationary masks and more realistic moving object masks. The excellent experimental results demonstrate the authenticity and reliability of the STTTN. Frontiers Media S.A. 2022-10-13 /pmc/articles/PMC9606320/ /pubmed/36310632 http://dx.doi.org/10.3389/fnbot.2022.1002453 Text en Copyright © 2022 Ma and Xue. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Ma, Pengsen
Xue, Tao
Learning a spatial-temporal texture transformer network for video inpainting
title Learning a spatial-temporal texture transformer network for video inpainting
title_full Learning a spatial-temporal texture transformer network for video inpainting
title_fullStr Learning a spatial-temporal texture transformer network for video inpainting
title_full_unstemmed Learning a spatial-temporal texture transformer network for video inpainting
title_short Learning a spatial-temporal texture transformer network for video inpainting
title_sort learning a spatial-temporal texture transformer network for video inpainting
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606320/
https://www.ncbi.nlm.nih.gov/pubmed/36310632
http://dx.doi.org/10.3389/fnbot.2022.1002453
work_keys_str_mv AT mapengsen learningaspatialtemporaltexturetransformernetworkforvideoinpainting
AT xuetao learningaspatialtemporaltexturetransformernetworkforvideoinpainting