Cargando…

Self-supervised learning for remote sensing scene classification under the few shot scenario

Scene classification is a crucial research problem in remote sensing (RS) that has attracted many researchers recently. It has many challenges due to multiple issues, such as: the complexity of remote sensing scenes, the classes overlapping (as a scene may contain objects that belong to foreign clas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alosaimi, Najd, Alhichri, Haikel, Bazi, Yakoub, Ben Youssef, Belgacem, Alajlan, Naif
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9829684/ https://www.ncbi.nlm.nih.gov/pubmed/36624136 http://dx.doi.org/10.1038/s41598-022-27313-5

_version_	1784867511601201152
author	Alosaimi, Najd Alhichri, Haikel Bazi, Yakoub Ben Youssef, Belgacem Alajlan, Naif
author_facet	Alosaimi, Najd Alhichri, Haikel Bazi, Yakoub Ben Youssef, Belgacem Alajlan, Naif
author_sort	Alosaimi, Najd
collection	PubMed
description	Scene classification is a crucial research problem in remote sensing (RS) that has attracted many researchers recently. It has many challenges due to multiple issues, such as: the complexity of remote sensing scenes, the classes overlapping (as a scene may contain objects that belong to foreign classes), and the difficulty of gaining sufficient labeled scenes. Deep learning (DL) solutions and in particular convolutional neural networks (CNN) are now state-of-the-art solution in RS scene classification; however, CNN models need huge amounts of annotated data, which can be costly and time-consuming. On the other hand, it is relatively easy to acquire large amounts of unlabeled images. Recently, Self-Supervised Learning (SSL) is proposed as a method that can learn from unlabeled images, potentially reducing the need for labeling. In this work, we propose a deep SSL method, called RS-FewShotSSL, for RS scene classification under the few shot scenario when we only have a few (less than 20) labeled scenes per class. Under this scenario, typical DL solutions that fine-tune CNN models, pre-trained on the ImageNet dataset, fail dramatically. In the SSL paradigm, a DL model is pre-trained from scratch during the pretext task using the large amounts of unlabeled scenes. Then, during the main or the so-called downstream task, the model is fine-tuned on the labeled scenes. Our proposed RS-FewShotSSL solution is composed of an online network and a target network both using the EfficientNet-B3 CNN model as a feature encoder backbone. During the pretext task, RS-FewShotSSL learns discriminative features from the unlabeled images using cross-view contrastive learning. Different views are generated from each image using geometric transformations and passed to the online and target networks. Then, the whole model is optimized by minimizing the cross-view distance between the online and target networks. To address the problem of limited computation resources available to us, our proposed method uses a novel DL architecture that can be trained using both high-resolution and low-resolution images. During the pretext task, RS-FewShotSSL is trained using low-resolution images, thereby, allowing for larger batch sizes which significantly boosts the performance of the proposed pipeline on the task of RS classification. In the downstream task, the target network is discarded, and the online network is fine-tuned using the few labeled shots or scenes. Here, we use smaller batches of both high-resolution and low-resolution images. This architecture allows RS-FewshotSSL to benefit from both large batch sizes and full image sizes, thereby learning from the large amounts of unlabeled data in an effective way. We tested RS-FewShotSSL on three RS public datasets, and it demonstrated a significant improvement compared to other state-of-the-art methods such as: SimCLR, MoCo, BYOL and IDSSL.
format	Online Article Text
id	pubmed-9829684
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-98296842023-01-11 Self-supervised learning for remote sensing scene classification under the few shot scenario Alosaimi, Najd Alhichri, Haikel Bazi, Yakoub Ben Youssef, Belgacem Alajlan, Naif Sci Rep Article Scene classification is a crucial research problem in remote sensing (RS) that has attracted many researchers recently. It has many challenges due to multiple issues, such as: the complexity of remote sensing scenes, the classes overlapping (as a scene may contain objects that belong to foreign classes), and the difficulty of gaining sufficient labeled scenes. Deep learning (DL) solutions and in particular convolutional neural networks (CNN) are now state-of-the-art solution in RS scene classification; however, CNN models need huge amounts of annotated data, which can be costly and time-consuming. On the other hand, it is relatively easy to acquire large amounts of unlabeled images. Recently, Self-Supervised Learning (SSL) is proposed as a method that can learn from unlabeled images, potentially reducing the need for labeling. In this work, we propose a deep SSL method, called RS-FewShotSSL, for RS scene classification under the few shot scenario when we only have a few (less than 20) labeled scenes per class. Under this scenario, typical DL solutions that fine-tune CNN models, pre-trained on the ImageNet dataset, fail dramatically. In the SSL paradigm, a DL model is pre-trained from scratch during the pretext task using the large amounts of unlabeled scenes. Then, during the main or the so-called downstream task, the model is fine-tuned on the labeled scenes. Our proposed RS-FewShotSSL solution is composed of an online network and a target network both using the EfficientNet-B3 CNN model as a feature encoder backbone. During the pretext task, RS-FewShotSSL learns discriminative features from the unlabeled images using cross-view contrastive learning. Different views are generated from each image using geometric transformations and passed to the online and target networks. Then, the whole model is optimized by minimizing the cross-view distance between the online and target networks. To address the problem of limited computation resources available to us, our proposed method uses a novel DL architecture that can be trained using both high-resolution and low-resolution images. During the pretext task, RS-FewShotSSL is trained using low-resolution images, thereby, allowing for larger batch sizes which significantly boosts the performance of the proposed pipeline on the task of RS classification. In the downstream task, the target network is discarded, and the online network is fine-tuned using the few labeled shots or scenes. Here, we use smaller batches of both high-resolution and low-resolution images. This architecture allows RS-FewshotSSL to benefit from both large batch sizes and full image sizes, thereby learning from the large amounts of unlabeled data in an effective way. We tested RS-FewShotSSL on three RS public datasets, and it demonstrated a significant improvement compared to other state-of-the-art methods such as: SimCLR, MoCo, BYOL and IDSSL. Nature Publishing Group UK 2023-01-09 /pmc/articles/PMC9829684/ /pubmed/36624136 http://dx.doi.org/10.1038/s41598-022-27313-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Alosaimi, Najd Alhichri, Haikel Bazi, Yakoub Ben Youssef, Belgacem Alajlan, Naif Self-supervised learning for remote sensing scene classification under the few shot scenario
title	Self-supervised learning for remote sensing scene classification under the few shot scenario
title_full	Self-supervised learning for remote sensing scene classification under the few shot scenario
title_fullStr	Self-supervised learning for remote sensing scene classification under the few shot scenario
title_full_unstemmed	Self-supervised learning for remote sensing scene classification under the few shot scenario
title_short	Self-supervised learning for remote sensing scene classification under the few shot scenario
title_sort	self-supervised learning for remote sensing scene classification under the few shot scenario
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9829684/ https://www.ncbi.nlm.nih.gov/pubmed/36624136 http://dx.doi.org/10.1038/s41598-022-27313-5
work_keys_str_mv	AT alosaiminajd selfsupervisedlearningforremotesensingsceneclassificationunderthefewshotscenario AT alhichrihaikel selfsupervisedlearningforremotesensingsceneclassificationunderthefewshotscenario AT baziyakoub selfsupervisedlearningforremotesensingsceneclassificationunderthefewshotscenario AT benyoussefbelgacem selfsupervisedlearningforremotesensingsceneclassificationunderthefewshotscenario AT alajlannaif selfsupervisedlearningforremotesensingsceneclassificationunderthefewshotscenario

Self-supervised learning for remote sensing scene classification under the few shot scenario

Ejemplares similares