High Performance Embedded Solutions for Memory and Compute Intense Applications
thesisposted on 2016-07-01, 00:00 authored by Umer I. Cheema
In recent years, the use of application-specific architectures has gained popularity in implementing embedded solutions to many real-time and compute intense tasks due mainly to the performance and power limitations associated with high-end processor based systems. Considering their superior performance at lower energy requirements, such targeted solutions have found applications in a number of domains including mobile devices, Big-Data processing in Data centers, Computer Network Security, Medical Imaging and Internet of Things (IoT). Considering this tremendous potential in diverse range of applications, this thesis targets development of efficient solutions for various memory and compute-intense applications. We specifically target FPGAs as the computing platform considering the power-efficiency and the enormous design space associated with the platform due to its inherent flexibility. The hardware solutions proposed as part of this thesis have shown improved performance at reduced power compared with existing solutions. First, we propose a memory-optimized and power-efficient architecture to accelerate the memory and compute intense re-gridding process in Non-uniform Fast Fourier Transform (NuFFT). NuFFT is widely used for image reconstruction in a variety of applications like Magnetic Resonance Imaging (MRI) and Synthetic Aperture Radar (SAR). Re-gridding step is known to be the most time-consuming task (88 – 90 %) in the computation of NuFFT. Second, a hardware efficient architecture for Gauss-Jordan based matrix inversion has been proposed that minimizes the floating-point computation resources compared with existing solutions. Matrix inversion is a central task in many real-time applications including Compressive Sensing based image reconstruction, Cryptography and MIMO-OFDM systems. Third, we propose a novel architecture for computing Burrows Wheeler Transform (BWT) that is based on better utilization of FPGA resources to achieve high-performance. BWT, initially developed for data-compression, has found applications in many real-time applications like Compressed String Matching, Computer Vision, Test Data Compression and Channel coding. Finally, a highly pipelined and scalable architecture for median filtering has also been proposed. Median filter and its variants are widely used for noise suppression in image processing.