posted on 2019-08-01, 00:00authored bySrikanth Ramakrishna
Over the past few years, there has been an increasing demand for low power and high performance machine learning particularly, for large scale deployment of Neural Network systems on hardware platforms due to the requirement of large volumes of data and growing complexity of network architectures. Frequent off-chip memory accesses during multiply and accumulate operations cause the bottleneck in achieving high power efficiency.
There have been many near-data processing techniques being proposed but they fall short in terms of accuracy. In this work, we propose an In-SRAM based neural network to address the challenge. The idea involves the design of an SRAM bank architecture with compute units being infused within the memory cells capable of running inference on the fully connected layers of a Convolutional Neural Network for image classification task. Each SRAM cell receives a portion of the discrete weight matix and performs in-memory computations with corresponding layer activations. By leveraging this potential on-chip memory framework combined with the property of minimal access time in SRAM cells, high power efficiency can be obtained at proportionally high speeds.
In order to improve the performance we also propose the method of support optimization on the weight parameters. In this approach, rather than constraining the weights to take up binary/ternary values, we provide flexibility by mapping the weights of an SRAM cell to a trainable real parameter that is unique to the given cell but varies across the architecture.This results in weight matrices forming a middle-ground between entirely full-precision and entirely discrete with no compromise on the performance.