POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator
In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The architecture has low-power and high-speed features that are achieved through overlapping the timing of the operations and pipelining the datapath. Moreover, this architecture requires negligible internal memory size for storing the intermediate data leading to low resource utilization and simple routing, which provides lower interconnect delay (higher operating frequency). A designer may adjust the resource utilization (as well as the latency) of the proposed architecture readily at the registertransfer level (RTL) design by adjusting the amount of parallelization. This makes the process of mapping the architecture onto different types of FPGAs, subject to defined constraints, a simple one. The efficacy of the proposed architecture is assessed by implementing an LSTM network on different types of FPGAs. Compared with the recent works, the proposed architecture provides up to about 1.6Ã?, 43.6Ã?, 21.9Ã?, and 114.5Ã? improvements in frequency, power efficiency, GOP/s, and GOP/s/W, respectively. Finally, our proposed architecture operates at 17.64 GOP/s, which is 2.31Ã? faster than the best previously reported results.
Field-programmable gate array (FPGA), high speed, low power, low resource utilization, long short-term memory (LSTM) accelerator.