Efficient, Mixed Precision In-Memory Deep learning at the Edge
thesis
posted on 2022-12-01, 00:00authored byShamma Nasrin
Deep neural networks (DNNs) have shown remarkable prediction accuracy in many practical applications. DNNs in these applications typically utilize thousands to millions of parameters
(i.e., weights) and are trained over a huge number of example patterns. Operating over such a large parametric space, which is carefully orchestrated over multiple abstraction levels (i.e., hidden layers), facilitates DNNs with a superior generalization and learning capacity but also
presents critical inference constraints, especially when considering real-time and/or low-power applications. This thesis proposes approaches for low-energy implementation for DNN
accelerators for edge applications. We propose a co-design approach for compute-in-memory
inference for deep neural networks (DNN). We use multiplication-free function approximators
based on the l`1 norm along with a co-adapted processing array and compute flow. Using the
approach, we overcame many deficiencies in the current art of in-SRAM DNN processing, such
as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the
need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation
seamlessly extends to multi-bit precision weights, doesn’t require DACs, and easily extends to
higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation
ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of the SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive
DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. The thesis also explores automation algorithms for
searching energy-optimized neural architecture.