NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration
Neural networks (NNs) have shown great ability to process emerging applications such as speech recognition, language recognition, image classification, video segmentation, and gaming. It is therefore important to make NNs efficient. Although attempts have been made to improve NNs‚?? computation cost, the data movement between memory and processing cores is the main bottleneck for NNs‚?? energy consumption and execution time. This makes the implementation of NNs significantly slower on traditional CPU/GPU cores. In this paper, we propose a novel processing in-memory architecture, called NNPIM, that significantly accelerates neural network‚??s inference phase inside the memory. First, we design a crossbar memory architecture that supports fast addition, multiplication, and search operations inside the memory. Second, we introduce simple optimization techniques which significantly improves NNs‚?? performance and reduces the overall energy consumption. We also map all NN functionalities using parallel in-memory components. To further improve the efficiency, our design supports weight sharing to reduce the number of computations in memory and consecutively speedup NNPIM computation. We compare the efficiency of our proposed NNPIM with GPU and the state-of-the-art PIM architectures. Our evaluation shows that our design can achieve 131.5 higher energy efficiency and is 48.2 faster as compared to NVIDIA GTX 1080 GPU architecture. Compared to state-ofthe- art neural network accelerators, NNPIM can achieve on an average 3.6 higher energy efficiency and is 4.6 faster, while providing the same classification accuracy.
Non-volatile memory, Processing in-Memory, Neural Networks