Cargando…

Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN

In orchard fruit picking systems for pears, the challenge is to identify the full shape of the soft fruit to avoid injuries while using robotic or automatic picking systems. Advancements in computer vision have brought the potential to train for different shapes and sizes of fruit using deep learnin...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Siyu, Ahamed, Tofael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9185418/
https://www.ncbi.nlm.nih.gov/pubmed/35684807
http://dx.doi.org/10.3390/s22114187
_version_ 1784724719022374912
author Pan, Siyu
Ahamed, Tofael
author_facet Pan, Siyu
Ahamed, Tofael
author_sort Pan, Siyu
collection PubMed
description In orchard fruit picking systems for pears, the challenge is to identify the full shape of the soft fruit to avoid injuries while using robotic or automatic picking systems. Advancements in computer vision have brought the potential to train for different shapes and sizes of fruit using deep learning algorithms. In this research, a fruit recognition method for robotic systems was developed to identify pears in a complex orchard environment using a 3D stereo camera combined with Mask Region-Convolutional Neural Networks (Mask R-CNN) deep learning technology to obtain targets. This experiment used 9054 RGBA original images (3018 original images and 6036 augmented images) to create a dataset divided into a training, validation, and testing sets. Furthermore, we collected the dataset under different lighting conditions at different times which were high-light (9–10 am) and low-light (6–7 pm) conditions at JST, Tokyo Time, August 2021 (summertime) to prepare training, validation, and test datasets at a ratio of 6:3:1. All the images were taken by a 3D stereo camera which included PERFORMANCE, QUALITY, and ULTRA models. We used the PERFORMANCE model to capture images to make the datasets; the camera on the left generated depth images and the camera on the right generated the original images. In this research, we also compared the performance of different types with the R-CNN model (Mask R-CNN and Faster R-CNN); the mean Average Precisions (mAP) of Mask R-CNN and Faster R-CNN were compared in the same datasets with the same ratio. Each epoch in Mask R-CNN was set at 500 steps with total 80 epochs. And Faster R-CNN was set at 40,000 steps for training. For the recognition of pears, the Mask R-CNN, had the mAPs of 95.22% for validation set and 99.45% was observed for the testing set. On the other hand, mAPs were observed 87.9% in the validation set and 87.52% in the testing set using Faster R-CNN. The different models using the same dataset had differences in performance in gathering clustered pears and individual pear situations. Mask R-CNN outperformed Faster R-CNN when the pears are densely clustered at the complex orchard. Therefore, the 3D stereo camera-based dataset combined with the Mask R-CNN vision algorithm had high accuracy in detecting the individual pears from gathered pears in a complex orchard environment.
format Online
Article
Text
id pubmed-9185418
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91854182022-06-11 Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN Pan, Siyu Ahamed, Tofael Sensors (Basel) Article In orchard fruit picking systems for pears, the challenge is to identify the full shape of the soft fruit to avoid injuries while using robotic or automatic picking systems. Advancements in computer vision have brought the potential to train for different shapes and sizes of fruit using deep learning algorithms. In this research, a fruit recognition method for robotic systems was developed to identify pears in a complex orchard environment using a 3D stereo camera combined with Mask Region-Convolutional Neural Networks (Mask R-CNN) deep learning technology to obtain targets. This experiment used 9054 RGBA original images (3018 original images and 6036 augmented images) to create a dataset divided into a training, validation, and testing sets. Furthermore, we collected the dataset under different lighting conditions at different times which were high-light (9–10 am) and low-light (6–7 pm) conditions at JST, Tokyo Time, August 2021 (summertime) to prepare training, validation, and test datasets at a ratio of 6:3:1. All the images were taken by a 3D stereo camera which included PERFORMANCE, QUALITY, and ULTRA models. We used the PERFORMANCE model to capture images to make the datasets; the camera on the left generated depth images and the camera on the right generated the original images. In this research, we also compared the performance of different types with the R-CNN model (Mask R-CNN and Faster R-CNN); the mean Average Precisions (mAP) of Mask R-CNN and Faster R-CNN were compared in the same datasets with the same ratio. Each epoch in Mask R-CNN was set at 500 steps with total 80 epochs. And Faster R-CNN was set at 40,000 steps for training. For the recognition of pears, the Mask R-CNN, had the mAPs of 95.22% for validation set and 99.45% was observed for the testing set. On the other hand, mAPs were observed 87.9% in the validation set and 87.52% in the testing set using Faster R-CNN. The different models using the same dataset had differences in performance in gathering clustered pears and individual pear situations. Mask R-CNN outperformed Faster R-CNN when the pears are densely clustered at the complex orchard. Therefore, the 3D stereo camera-based dataset combined with the Mask R-CNN vision algorithm had high accuracy in detecting the individual pears from gathered pears in a complex orchard environment. MDPI 2022-05-31 /pmc/articles/PMC9185418/ /pubmed/35684807 http://dx.doi.org/10.3390/s22114187 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pan, Siyu
Ahamed, Tofael
Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title_full Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title_fullStr Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title_full_unstemmed Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title_short Pear Recognition in an Orchard from 3D Stereo Camera Datasets to Develop a Fruit Picking Mechanism Using Mask R-CNN
title_sort pear recognition in an orchard from 3d stereo camera datasets to develop a fruit picking mechanism using mask r-cnn
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9185418/
https://www.ncbi.nlm.nih.gov/pubmed/35684807
http://dx.doi.org/10.3390/s22114187
work_keys_str_mv AT pansiyu pearrecognitioninanorchardfrom3dstereocameradatasetstodevelopafruitpickingmechanismusingmaskrcnn
AT ahamedtofael pearrecognitioninanorchardfrom3dstereocameradatasetstodevelopafruitpickingmechanismusingmaskrcnn