Cargando…

An accurate generation of image captions for blind people using extended convolutional atom neural network

Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tiwary, Tejal, Mahapatra, Rajendra Prasad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9283099/ https://www.ncbi.nlm.nih.gov/pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5

_version_	1784747261162422272
author	Tiwary, Tejal Mahapatra, Rajendra Prasad
author_facet	Tiwary, Tejal Mahapatra, Rajendra Prasad
author_sort	Tiwary, Tejal
collection	PubMed
description	Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works.
format	Online Article Text
id	pubmed-9283099
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-92830992022-07-15 An accurate generation of image captions for blind people using extended convolutional atom neural network Tiwary, Tejal Mahapatra, Rajendra Prasad Multimed Tools Appl Article Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works. Springer US 2022-07-15 2023 /pmc/articles/PMC9283099/ /pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Tiwary, Tejal Mahapatra, Rajendra Prasad An accurate generation of image captions for blind people using extended convolutional atom neural network
title	An accurate generation of image captions for blind people using extended convolutional atom neural network
title_full	An accurate generation of image captions for blind people using extended convolutional atom neural network
title_fullStr	An accurate generation of image captions for blind people using extended convolutional atom neural network
title_full_unstemmed	An accurate generation of image captions for blind people using extended convolutional atom neural network
title_short	An accurate generation of image captions for blind people using extended convolutional atom neural network
title_sort	accurate generation of image captions for blind people using extended convolutional atom neural network
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9283099/ https://www.ncbi.nlm.nih.gov/pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5
work_keys_str_mv	AT tiwarytejal anaccurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT mahapatrarajendraprasad anaccurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT tiwarytejal accurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT mahapatrarajendraprasad accurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork

An accurate generation of image captions for blind people using extended convolutional atom neural network

Ejemplares similares