Cargando…
Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture
This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on groupin...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/ https://www.ncbi.nlm.nih.gov/pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254 |
_version_ | 1785149494933848064 |
---|---|
author | Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas |
author_facet | Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas |
author_sort | Strakos, Petr |
collection | PubMed |
description | This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on. |
format | Online Article Text |
id | pubmed-10672137 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-106721372023-11-20 Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas J Imaging Article This paper presents a parallel implementation of a non-local transform-domain filter (BM4D). The effectiveness of the parallel implementation is demonstrated by denoising image series from computed tomography (CT) and magnetic resonance imaging (MRI). The basic idea of the filter is based on grouping and filtering similar data within the image. Due to the high level of similarity and data redundancy, the filter can provide even better denoising quality than current extensively used approaches based on deep learning (DL). In BM4D, cubes of voxels named patches are the essential image elements for filtering. Using voxels instead of pixels means that the area for searching similar patches is large. Because of this and the application of multi-dimensional transformations, the computation time of the filter is exceptionally long. The original implementation of BM4D is only single-threaded. We provide a parallel version of the filter that supports multi-core and many-core processors and scales on such versatile hardware resources, typical for high-performance computing clusters, even if they are concurrently used for the task. Our algorithm uses hybrid parallelisation that combines open multi-processing (OpenMP) and message passing interface (MPI) technologies and provides up to 283× speedup, which is a 99.65% reduction in processing time compared to the sequential version of the algorithm. In denoising quality, the method performs considerably better than recent DL methods on the data type that these methods have yet to be trained on. MDPI 2023-11-20 /pmc/articles/PMC10672137/ /pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Strakos, Petr Jaros, Milan Riha, Lubomir Kozubek, Tomas Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title | Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title_full | Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title_fullStr | Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title_full_unstemmed | Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title_short | Speed Up of Volumetric Non-Local Transform-Domain Filter Utilising HPC Architecture |
title_sort | speed up of volumetric non-local transform-domain filter utilising hpc architecture |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10672137/ https://www.ncbi.nlm.nih.gov/pubmed/37998101 http://dx.doi.org/10.3390/jimaging9110254 |
work_keys_str_mv | AT strakospetr speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT jarosmilan speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT rihalubomir speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture AT kozubektomas speedupofvolumetricnonlocaltransformdomainfilterutilisinghpcarchitecture |