Cargando…

Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture

This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on groupin...

Descripción completa

Detalles Bibliográficos
Autores principales: Strakos, Petr, Jaros, Milan, Riha, Lubomir, Kozubek, Tomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/
https://www.ncbi.nlm.nih.gov/pubmed/37998101
http://dx.doi.org/10.3390/jimaging9110254
_version_ 1785149494933848064
author Strakos, Petr
Jaros, Milan
Riha, Lubomir
Kozubek, Tomas
author_facet Strakos, Petr
Jaros, Milan
Riha, Lubomir
Kozubek, Tomas
author_sort Strakos, Petr
collection PubMed
description This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on.
format Online
Article
Text
id pubmed-10672137
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106721372023-11-20 Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas J Imaging Article This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on. MDPI 2023-11-20 /pmc/articles/PMC10672137/ /pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Strakos, Petr
Jaros, Milan
Riha, Lubomir
Kozubek, Tomas
Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_full Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_fullStr Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_full_unstemmed Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_short Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_sort speed up of volumetric non-local transform-domain filter utilising hpc architecture
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/
https://www.ncbi.nlm.nih.gov/pubmed/37998101
http://dx.doi.org/10.3390/jimaging9110254
work_keys_str_mv AT strakospetr speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture
AT jarosmilan speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture
AT rihalubomir speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture
AT kozubektomas speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture