Architecture Design for Efficient Non-uniform Fast Fourier Transform (NuFFT) Implementation on FPGA
thesisposted on 27.10.2017, 00:00 by Alex Iacobucci
The pivotal role of Fourier Analysis in discrete-time signals processing has been firmly estab- lished and is common knowledge. Consequently, several algorithms and techniques have been developed to perform efficient computation of Forward and Inverse Discrete Fourier Transform, with the Fast Fourier Transform (FFT) at the top of the list. Most of these algorithms have been designed to deal with the common case of data sampled on regular rectangular grids. However, in some relevant real life applications, the input signal is not sampled uniformly, hence raising the issue of computing an accurate yet computationally efficient Fourier Transform. The algorithm for meeting this challenge is usually referred to as Non-uniform Fast Fourier Transform. A critical application involving irregular data sampling is Magnetic Resonance Imaging (MRI). MRI has quickly won its place among the most used medical diagnosis techniques, due to its lower impact on the human body and its high accuracy in distinguishing abnormal tissue from normal tissue. One of the drawback in MRI is the long acquisition time, and the unwanted artifacts that can be present in the reconstructed image. To overcome both of these disadvan- tages, non-Cartesian acquisition trajectories have been designed and are now often employed in actual acquisitions. However, while they offer lower scan times and aliasing, the samples are not regularly spaced and hence introduce the challenge of efficiently computing a NuFFT. In view of the relevance of the problem, this thesis investigates an improved implementation of NuFFT by seeking to exploit the promise of FPGA-based acceleration. The goals of this thesis is to provide an effective implementation using OpenCL, a recent higher level language, to solve the problem in the context of MR Imaging. The objective is to prove that combining OpenCL and FPGAs it is possible to achieve results valuable from both an academic and commercial standpoint, while taking advantage of the simpler design process provided by OpenCL. This work focuses on proving that by following the guidelines of OpenCL programming, and exploiting the advances in FPGA technology, it is possible to build solutions that are competitive with respect to academic standards. To begin with, the host code has been optimized to achieve high parallelism and overcome the bottlenecks induces by disk operations. Diverse approaches were designed and synthesized to exploit known acquisition trajectories in order to boost per- formance. By analyzing the interpolation kernels and their impact on the reconstruction SSIM, optimal values have been identified for the convolution window size and the oversampling factor, yielding a satisfactory trade-off in accuracy and computational time. Even further, this work shows that while tuning convolution parameters can significantly impact both performance and accuracy, a proper knowledge of the dependency of the reconstruction on the convolution kernel guarantees the chance of achieving the desired throughput/accuracy values, therefore making the proposed architecture suitable for a wide range of applications.