A Parallel Nonlocal Means Algorithm for Remote Sensing Image Denoising on an Intel Xeon Phi Platform
The nonlocal means (NLM) algorithm is one of the best image denoising algorithms because of its superior capability to retain the texture details of an image and is widely used in remote sensing (RS) image preprocessing. However, the time complexity of the algorithm is very high due to its nonlocality when searching for similar pixels. As a result, the NLM algorithm cannot satisfy the near real-time requirements of some specic applications. To resolve this issue, a parallel NLM algorithm based on Intel Xeon Phi hardware with Intel's many integrated cores (MIC) architecture was designed and implemented in this paper. The parallel algorithm achieved satisfactory speedup, but the speedup obtained showed a step-like distribution for different image sizes. This result was not expected based on the theoretical analysis, which predicted that the speedup should be independent of input data set size. To address this problem, the parallel algorithm was further optimized by adding pretreatment approaches and cutting down the number of nested loops in the MIC. Finally, experiments using the standard and optimized versions were carried out using the RS images of different sizes. Several conclusions could be drawn from the experimental results: 1) the standard parallel algorithm can obtain better speedup with only one MIC card and 2) the optimized parallel algorithm can completely eliminate the step distribution of the speedup and can also accelerate RS image processing signicantly.
Parallel computing, MIC, remote sensing, image processing, OpenMP