Cargando…
An accurate generation of image captions for blind people using extended convolutional atom neural network
Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9283099/ https://www.ncbi.nlm.nih.gov/pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5 |
_version_ | 1784747261162422272 |
---|---|
author | Tiwary, Tejal Mahapatra, Rajendra Prasad |
author_facet | Tiwary, Tejal Mahapatra, Rajendra Prasad |
author_sort | Tiwary, Tejal |
collection | PubMed |
description | Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works. |
format | Online Article Text |
id | pubmed-9283099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-92830992022-07-15 An accurate generation of image captions for blind people using extended convolutional atom neural network Tiwary, Tejal Mahapatra, Rajendra Prasad Multimed Tools Appl Article Recently, the progress on image understanding and AIC (Automatic Image Captioning) has attracted lots of researchers to make use of AI (Artificial Intelligence) models to assist the blind people. AIC integrates the principle of both computer vision and NLP (Natural Language Processing) to generate automatic language descriptions in relation to the image observed. This work presents a new assistive technology based on deep learning which helps the blind people to distinguish the food items in online grocery shopping. The proposed AIC model involves the following steps such as Data Collection, Non-captioned image selection, Extraction of appearance, texture features and Generation of automatic image captions. Initially, the data is collected from two public sources and the selection of non-captioned images are done using the ARO (Adaptive Rain Optimization). Next, the appearance feature is extracted using SDM (Spatial Derivative and Multi-scale) approach and WPLBP (Weighted Patch Local Binary Pattern) is used in the extraction of texture features. Finally, the captions are automatically generated using ECANN (Extended Convolutional Atom Neural Network). ECANN model combines the CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) architectures to perform the caption reusable system to select the most accurate caption. The loss in the ECANN architecture is minimized using AAS (Adaptive Atom Search) Optimization algorithm. The implementation tool used is PYTHON and the dataset used for the analysis are Grocery datasets (Freiburg Groceries and Grocery Store Dataset). The proposed ECANN model acquired accuracy (99.46%) on Grocery Store Dataset and (99.32%) accuracy on Freiburg Groceries dataset. Thus, the performance of the proposed ECANN model is compared with other existing models to verify the supremacy of the proposed work over the other existing works. Springer US 2022-07-15 2023 /pmc/articles/PMC9283099/ /pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Tiwary, Tejal Mahapatra, Rajendra Prasad An accurate generation of image captions for blind people using extended convolutional atom neural network |
title | An accurate generation of image captions for blind people using extended convolutional atom neural network |
title_full | An accurate generation of image captions for blind people using extended convolutional atom neural network |
title_fullStr | An accurate generation of image captions for blind people using extended convolutional atom neural network |
title_full_unstemmed | An accurate generation of image captions for blind people using extended convolutional atom neural network |
title_short | An accurate generation of image captions for blind people using extended convolutional atom neural network |
title_sort | accurate generation of image captions for blind people using extended convolutional atom neural network |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9283099/ https://www.ncbi.nlm.nih.gov/pubmed/35855372 http://dx.doi.org/10.1007/s11042-022-13443-5 |
work_keys_str_mv | AT tiwarytejal anaccurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT mahapatrarajendraprasad anaccurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT tiwarytejal accurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork AT mahapatrarajendraprasad accurategenerationofimagecaptionsforblindpeopleusingextendedconvolutionalatomneuralnetwork |