FPGA-Based Depth Separable Convolution Neural Network
In order to enable convolution neural network (CNN) to be deployed on a Field Programmable Gate Array (FPGA), this study builds a lightweight convolutional neural network that can be separated by a depth to reduce the amount of parameters and computations stored. We replaced the standard convolution operation with a separate convolution operation, and proposed a hardware accelerator architecture that can handle differently sized depth-separable convolution operations, using parallelization to efficiently utilize hardware resources for depth separable convolution. Therefore, data can be reused to reduce number of memory accesses. This hardware accelerator can achieve 588 frames per second and 37.88M ops/sec throughput at 100MHz clock.