NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration
Neural networks (NNs) have shown great ability
to process emerging applications such as speech recognition,
language recognition, image classification, video segmentation,
and gaming. It is therefore important to make NNs efficient.
Although attempts have been made to improve NNsâ?? computation
cost, the data movement between memory and processing cores is
the main bottleneck for NNsâ?? energy consumption and execution
time. This makes the implementation of NNs significantly slower
on traditional CPU/GPU cores. In this paper, we propose a
novel processing in-memory architecture, called NNPIM, that
significantly accelerates neural networkâ??s inference phase inside
the memory. First, we design a crossbar memory architecture
that supports fast addition, multiplication, and search operations
inside the memory. Second, we introduce simple optimization
techniques which significantly improves NNsâ?? performance and
reduces the overall energy consumption. We also map all NN
functionalities using parallel in-memory components. To further
improve the efficiency, our design supports weight sharing to
reduce the number of computations in memory and consecutively
speedup NNPIM computation. We compare the efficiency of
our proposed NNPIM with GPU and the state-of-the-art PIM
architectures. Our evaluation shows that our design can achieve
131.5 higher energy efficiency and is 48.2 faster as compared
to NVIDIA GTX 1080 GPU architecture. Compared to state-ofthe-
art neural network accelerators, NNPIM can achieve on an
average 3.6 higher energy efficiency and is 4.6 faster, while
providing the same classification accuracy.
KeyWords
Non-volatile memory, Processing in-Memory,
Neural Networks
|