Cargando…
Learning a spatial-temporal texture transformer network for video inpainting
We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model t...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606320/ https://www.ncbi.nlm.nih.gov/pubmed/36310632 http://dx.doi.org/10.3389/fnbot.2022.1002453 |
_version_ | 1784818270725996544 |
---|---|
author | Ma, Pengsen Xue, Tao |
author_facet | Ma, Pengsen Xue, Tao |
author_sort | Ma, Pengsen |
collection | PubMed |
description | We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model to extract information and reconstruct the content, resulting in the inability to reconstruct the textures that should be transferred accurately. In this paper, we propose a novel and effective spatial-temporal texture transformer network (STTTN) for video inpainting. STTTN consists of six closely related modules optimized for video inpainting tasks: feature similarity measure for more accurate frame pre-repair, an encoder with strong information extraction ability, embedding module for finding a correlation, coarse low-frequency feature transfer, refinement high-frequency feature transfer, and decoder with accurate content reconstruction ability. Such a design encourages joint feature learning across the input and reference frames. To demonstrate the advancedness and effectiveness of the proposed model, we conduct comprehensive ablation learning and qualitative and quantitative experiments on multiple datasets by using standard stationary masks and more realistic moving object masks. The excellent experimental results demonstrate the authenticity and reliability of the STTTN. |
format | Online Article Text |
id | pubmed-9606320 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96063202022-10-28 Learning a spatial-temporal texture transformer network for video inpainting Ma, Pengsen Xue, Tao Front Neurorobot Neuroscience We study video inpainting, which aims to recover realistic textures from damaged frames. Recent progress has been made by taking other frames as references so that relevant textures can be transferred to damaged frames. However, existing video inpainting approaches neglect the ability of the model to extract information and reconstruct the content, resulting in the inability to reconstruct the textures that should be transferred accurately. In this paper, we propose a novel and effective spatial-temporal texture transformer network (STTTN) for video inpainting. STTTN consists of six closely related modules optimized for video inpainting tasks: feature similarity measure for more accurate frame pre-repair, an encoder with strong information extraction ability, embedding module for finding a correlation, coarse low-frequency feature transfer, refinement high-frequency feature transfer, and decoder with accurate content reconstruction ability. Such a design encourages joint feature learning across the input and reference frames. To demonstrate the advancedness and effectiveness of the proposed model, we conduct comprehensive ablation learning and qualitative and quantitative experiments on multiple datasets by using standard stationary masks and more realistic moving object masks. The excellent experimental results demonstrate the authenticity and reliability of the STTTN. Frontiers Media S.A. 2022-10-13 /pmc/articles/PMC9606320/ /pubmed/36310632 http://dx.doi.org/10.3389/fnbot.2022.1002453 Text en Copyright © 2022 Ma and Xue. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Neuroscience Ma, Pengsen Xue, Tao Learning a spatial-temporal texture transformer network for video inpainting |
title | Learning a spatial-temporal texture transformer network for video inpainting |
title_full | Learning a spatial-temporal texture transformer network for video inpainting |
title_fullStr | Learning a spatial-temporal texture transformer network for video inpainting |
title_full_unstemmed | Learning a spatial-temporal texture transformer network for video inpainting |
title_short | Learning a spatial-temporal texture transformer network for video inpainting |
title_sort | learning a spatial-temporal texture transformer network for video inpainting |
topic | Neuroscience |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9606320/ https://www.ncbi.nlm.nih.gov/pubmed/36310632 http://dx.doi.org/10.3389/fnbot.2022.1002453 |
work_keys_str_mv | AT mapengsen learningaspatialtemporaltexturetransformernetworkforvideoinpainting AT xuetao learningaspatialtemporaltexturetransformernetworkforvideoinpainting |