Cargando…

iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis

This article presents a dataset of hyperspectral images of handwriting samples collected from 54 individuals. The purpose of the presented dataset is to further explore the use of hyperspectral imaging in document image analysis and to benchmark the performance of forensic analysis methods for hyper...

Descripción completa

Detalles Bibliográficos
Autores principales: Islam, Ammad Ul, Khan, Muhammad Jaleed, Asad, Muhammad, Khan, Haris Ahmad, Khurshid, Khurram
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8873541/
https://www.ncbi.nlm.nih.gov/pubmed/35242944
http://dx.doi.org/10.1016/j.dib.2022.107964
_version_ 1784657489714741248
author Islam, Ammad Ul
Khan, Muhammad Jaleed
Asad, Muhammad
Khan, Haris Ahmad
Khurshid, Khurram
author_facet Islam, Ammad Ul
Khan, Muhammad Jaleed
Asad, Muhammad
Khan, Haris Ahmad
Khurshid, Khurram
author_sort Islam, Ammad Ul
collection PubMed
description This article presents a dataset of hyperspectral images of handwriting samples collected from 54 individuals. The purpose of the presented dataset is to further explore the use of hyperspectral imaging in document image analysis and to benchmark the performance of forensic analysis methods for hyperspectral document images. Each hyperspectral cube in the dataset has a spatial resolution of 512 × 650 pixels and contains 149 spectral channels in the spectral range of 478–901 nm. All the individuals have different personalities and have their writing patterns. The information of age and gender of each individual is collected. Each subject has written twenty-eight sentences using 12 different varieties of pens from different brands in blue color, each approximately 9 words or 33 characters long, all English alphabets in capital and small cases, digits from 0 to 9. The previous methods use synthetic mixed samples created by joining different parts of the images from the UWA WIHSI dataset.Each document consists of real mixed samples written withdifferent pens and by different writers with a variety of mixing ratios of inks and writers for forensic analysis.The standard A4 pages, each weighing 70 gs and manufactured by “AA” company, are used for data collection. The handwritten notes written by each subject with different pens are annotated in rectangular boxes. This dataset can be used for several tasks related to hyperspectral document image analysis and document forensic analysis including, handwritten optical character recognition, ink mismatch detection, writer identification at sentence, word, and character-level, handwriting-based gender classification, handwriting-based age prediction, handwritten word segmentation, and word generation. This dataset was designed and collected by the research team at the Artificial intelligence and Computer Vision Lab (iVision), Institute of Space Technology, Pakistan, and the hyperspectral images were acquired through imaging spectroscopy in the visible wavelength range at Wageningen University & Research, the Netherlands.
format Online
Article
Text
id pubmed-8873541
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-88735412022-03-02 iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis Islam, Ammad Ul Khan, Muhammad Jaleed Asad, Muhammad Khan, Haris Ahmad Khurshid, Khurram Data Brief Data Article This article presents a dataset of hyperspectral images of handwriting samples collected from 54 individuals. The purpose of the presented dataset is to further explore the use of hyperspectral imaging in document image analysis and to benchmark the performance of forensic analysis methods for hyperspectral document images. Each hyperspectral cube in the dataset has a spatial resolution of 512 × 650 pixels and contains 149 spectral channels in the spectral range of 478–901 nm. All the individuals have different personalities and have their writing patterns. The information of age and gender of each individual is collected. Each subject has written twenty-eight sentences using 12 different varieties of pens from different brands in blue color, each approximately 9 words or 33 characters long, all English alphabets in capital and small cases, digits from 0 to 9. The previous methods use synthetic mixed samples created by joining different parts of the images from the UWA WIHSI dataset.Each document consists of real mixed samples written withdifferent pens and by different writers with a variety of mixing ratios of inks and writers for forensic analysis.The standard A4 pages, each weighing 70 gs and manufactured by “AA” company, are used for data collection. The handwritten notes written by each subject with different pens are annotated in rectangular boxes. This dataset can be used for several tasks related to hyperspectral document image analysis and document forensic analysis including, handwritten optical character recognition, ink mismatch detection, writer identification at sentence, word, and character-level, handwriting-based gender classification, handwriting-based age prediction, handwritten word segmentation, and word generation. This dataset was designed and collected by the research team at the Artificial intelligence and Computer Vision Lab (iVision), Institute of Space Technology, Pakistan, and the hyperspectral images were acquired through imaging spectroscopy in the visible wavelength range at Wageningen University & Research, the Netherlands. Elsevier 2022-02-16 /pmc/articles/PMC8873541/ /pubmed/35242944 http://dx.doi.org/10.1016/j.dib.2022.107964 Text en © 2022 The Author(s). Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Data Article
Islam, Ammad Ul
Khan, Muhammad Jaleed
Asad, Muhammad
Khan, Haris Ahmad
Khurshid, Khurram
iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title_full iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title_fullStr iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title_full_unstemmed iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title_short iVision HHID: Handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
title_sort ivision hhid: handwritten hyperspectral images dataset for benchmarking hyperspectral imaging-based document forensic analysis
topic Data Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8873541/
https://www.ncbi.nlm.nih.gov/pubmed/35242944
http://dx.doi.org/10.1016/j.dib.2022.107964
work_keys_str_mv AT islamammadul ivisionhhidhandwrittenhyperspectralimagesdatasetforbenchmarkinghyperspectralimagingbaseddocumentforensicanalysis
AT khanmuhammadjaleed ivisionhhidhandwrittenhyperspectralimagesdatasetforbenchmarkinghyperspectralimagingbaseddocumentforensicanalysis
AT asadmuhammad ivisionhhidhandwrittenhyperspectralimagesdatasetforbenchmarkinghyperspectralimagingbaseddocumentforensicanalysis
AT khanharisahmad ivisionhhidhandwrittenhyperspectralimagesdatasetforbenchmarkinghyperspectralimagingbaseddocumentforensicanalysis
AT khurshidkhurram ivisionhhidhandwrittenhyperspectralimagesdatasetforbenchmarkinghyperspectralimagingbaseddocumentforensicanalysis