Cargando…

SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification

Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tsourounis, Dimitrios, Kastaniotis, Dimitris, Theoharatos, Christos, Kazantzidis, Andreas, Economou, George
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604913/ https://www.ncbi.nlm.nih.gov/pubmed/36286349 http://dx.doi.org/10.3390/jimaging8100256

_version_	1784817934695137280
author	Tsourounis, Dimitrios Kastaniotis, Dimitris Theoharatos, Christos Kazantzidis, Andreas Economou, George
author_facet	Tsourounis, Dimitrios Kastaniotis, Dimitris Theoharatos, Christos Kazantzidis, Andreas Economou, George
author_sort	Tsourounis, Dimitrios
collection	PubMed
description	Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large amounts of samples and extreme augmentations. On the contrary, hand-crafted features, like SIFT, exhibit several interesting properties as they can provide local rotation invariance. In this work, a novel scheme combining the strengths of SIFT descriptors with CNNs, namely SIFT-CNN, is presented. Given a single-channel image, one SIFT descriptor is computed for every pixel, and thus, every pixel is represented as an M-dimensional histogram, which ultimately results in an M-channel image. Thus, the SIFT image is generated from the SIFT descriptors for all the pixels in a single-channel image, while at the same time, the original spatial size is preserved. Next, a CNN is trained to utilize these M-channel images as inputs by operating directly on the multiscale SIFT images with the regular convolution processes. Since these images incorporate spatial relations between the histograms of the SIFT descriptors, the CNN is guided to learn features from local gradient information of images that otherwise can be neglected. In this manner, the SIFT-CNN implicitly acquires a local rotation invariance property, which is desired for problems where local areas within the image can be rotated without affecting the overall classification result of the respective image. Some of these problems refer to indirect immunofluorescence (IIF) cell image classification, ground-based all-sky image-cloud classification and human lip-reading classification. The results for the popular datasets related to the three different aforementioned problems indicate that the proposed SIFT-CNN can improve the performance and surpasses the corresponding CNNs trained directly on pixel values in various challenging tasks due to its robustness in local rotations. Our findings highlight the importance of the input image representation in the overall efficiency of a data-driven system.
format	Online Article Text
id	pubmed-9604913
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96049132022-10-27 SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification Tsourounis, Dimitrios Kastaniotis, Dimitris Theoharatos, Christos Kazantzidis, Andreas Economou, George J Imaging Article Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large amounts of samples and extreme augmentations. On the contrary, hand-crafted features, like SIFT, exhibit several interesting properties as they can provide local rotation invariance. In this work, a novel scheme combining the strengths of SIFT descriptors with CNNs, namely SIFT-CNN, is presented. Given a single-channel image, one SIFT descriptor is computed for every pixel, and thus, every pixel is represented as an M-dimensional histogram, which ultimately results in an M-channel image. Thus, the SIFT image is generated from the SIFT descriptors for all the pixels in a single-channel image, while at the same time, the original spatial size is preserved. Next, a CNN is trained to utilize these M-channel images as inputs by operating directly on the multiscale SIFT images with the regular convolution processes. Since these images incorporate spatial relations between the histograms of the SIFT descriptors, the CNN is guided to learn features from local gradient information of images that otherwise can be neglected. In this manner, the SIFT-CNN implicitly acquires a local rotation invariance property, which is desired for problems where local areas within the image can be rotated without affecting the overall classification result of the respective image. Some of these problems refer to indirect immunofluorescence (IIF) cell image classification, ground-based all-sky image-cloud classification and human lip-reading classification. The results for the popular datasets related to the three different aforementioned problems indicate that the proposed SIFT-CNN can improve the performance and surpasses the corresponding CNNs trained directly on pixel values in various challenging tasks due to its robustness in local rotations. Our findings highlight the importance of the input image representation in the overall efficiency of a data-driven system. MDPI 2022-09-21 /pmc/articles/PMC9604913/ /pubmed/36286349 http://dx.doi.org/10.3390/jimaging8100256 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tsourounis, Dimitrios Kastaniotis, Dimitris Theoharatos, Christos Kazantzidis, Andreas Economou, George SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title	SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title_full	SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title_fullStr	SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title_full_unstemmed	SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title_short	SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
title_sort	sift-cnn: when convolutional neural networks meet dense sift descriptors for image and sequence classification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604913/ https://www.ncbi.nlm.nih.gov/pubmed/36286349 http://dx.doi.org/10.3390/jimaging8100256
work_keys_str_mv	AT tsourounisdimitrios siftcnnwhenconvolutionalneuralnetworksmeetdensesiftdescriptorsforimageandsequenceclassification AT kastaniotisdimitris siftcnnwhenconvolutionalneuralnetworksmeetdensesiftdescriptorsforimageandsequenceclassification AT theoharatoschristos siftcnnwhenconvolutionalneuralnetworksmeetdensesiftdescriptorsforimageandsequenceclassification AT kazantzidisandreas siftcnnwhenconvolutionalneuralnetworksmeetdensesiftdescriptorsforimageandsequenceclassification AT economougeorge siftcnnwhenconvolutionalneuralnetworksmeetdensesiftdescriptorsforimageandsequenceclassification

SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification

Ejemplares similares