posted on 2017-11-01, 00:00authored byAlessandro Pappalardo
Quantization has proven to be a powerful technique to reduce the memory footprint of Con
volutional Neural Networks at inference time without sacrifying their accuracy. This is especially
useful in the context of embedded and mobile devices, where memory resources comes at a price.
E orts have been invested in trying to quantize both weights and activations to a binary value,
given that computing convolution between binary sequences requires only bit-wise operations,
but keeping acceptable levels of accuracy has proven hard. A bigger quantized representation
than binary is needed, but a reduced computational footprint would also be desirable. To this
goal, researchers have have expanded bit-wise kernels to non-binary quantized convolution. We
explore a di erent approach to this problem based on the application of Number Theoretic
Transforms as fast algorithms for the computation of convolution in quantized Convolutional
Neural Networks.