Cargando…

Heuristic Attention Representation Learning for Self-Supervised Pretraining

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random crop...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran, Van Nhiem, Liu, Shen-Hsuan, Li, Yung-Hui, Wang, Jia-Ching
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9320898/
https://www.ncbi.nlm.nih.gov/pubmed/35890847
http://dx.doi.org/10.3390/s22145169
_version_ 1784755905686929408
author Tran, Van Nhiem
Liu, Shen-Hsuan
Li, Yung-Hui
Wang, Jia-Ching
author_facet Tran, Van Nhiem
Liu, Shen-Hsuan
Li, Yung-Hui
Wang, Jia-Ching
author_sort Tran, Van Nhiem
collection PubMed
description Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP(50) of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.
format Online
Article
Text
id pubmed-9320898
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-93208982022-07-27 Heuristic Attention Representation Learning for Self-Supervised Pretraining Tran, Van Nhiem Liu, Shen-Hsuan Li, Yung-Hui Wang, Jia-Ching Sensors (Basel) Article Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP(50) of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks. MDPI 2022-07-10 /pmc/articles/PMC9320898/ /pubmed/35890847 http://dx.doi.org/10.3390/s22145169 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Tran, Van Nhiem
Liu, Shen-Hsuan
Li, Yung-Hui
Wang, Jia-Ching
Heuristic Attention Representation Learning for Self-Supervised Pretraining
title Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_full Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_fullStr Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_full_unstemmed Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_short Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_sort heuristic attention representation learning for self-supervised pretraining
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9320898/
https://www.ncbi.nlm.nih.gov/pubmed/35890847
http://dx.doi.org/10.3390/s22145169
work_keys_str_mv AT tranvannhiem heuristicattentionrepresentationlearningforselfsupervisedpretraining
AT liushenhsuan heuristicattentionrepresentationlearningforselfsupervisedpretraining
AT liyunghui heuristicattentionrepresentationlearningforselfsupervisedpretraining
AT wangjiaching heuristicattentionrepresentationlearningforselfsupervisedpretraining