Cargando…

Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture

This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on groupin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Strakos, Petr, Jaros, Milan, Riha, Lubomir, Kozubek, Tomas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/ https://www.ncbi.nlm.nih.gov/pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254

_version_	1785149494933848064
author	Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas
author_facet	Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas
author_sort	Strakos, Petr
collection	PubMed
description	This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on.
format	Online Article Text
id	pubmed-10672137
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-106721372023-11-20 Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas J Imaging Article This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on. MDPI 2023-11-20 /pmc/articles/PMC10672137/ /pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title	Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_full	Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_fullStr	Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_full_unstemmed	Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_short	Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
title_sort	speed up of volumetric non-local transform-domain filter utilising hpc architecture
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/ https://www.ncbi.nlm.nih.gov/pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254
work_keys_str_mv	AT strakospetr speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT jarosmilan speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT rihalubomir speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT kozubektomas speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture

Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture

Ejemplares similares