BE/BTech & ME/MTech Final Year Projects for Computer Science | Information Technology | ECE Engineer | IEEE Projects Topics, PHD Projects Reports, Ideas and Download | Sai Info Solution | Nashik |Pune |Mumbai
director@saiinfo settings_phone02536644344 settings_phone+919270574718 +919096813348 settings_phone+917447889268
logo


SAI INFO SOLUTION


Diploma | BE |B.Tech |ME | M.Tech |PHD

Project Development and Training

Search Project by Domainwise


Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing


Scalable and Secure Big Data I

Class Agnostic Image Common Ob

3D Reconstruction in Canonical
Abstract


This paper presents Systolic-CNN, an OpenCLdefined scalable, run-time-flexible FPGA accelerator architecture, optimized for performing the low-latency, energy-efficient inference of various convolutional neural networks (CNNs) in the context of multi-tenancy cloud/edge computing. Systolic-CNN adopts a highly pipelined and parallelized 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. Systolic- CNN is highly scalable and parameterized, which can be easily adapted by users to achieve 100% utilization of the coarsegrained computation resources (i.e., DSP blocks) for a given FPGA. In addition, Systolic-CNN is run-time-flexible, which can be time-shared, in the context of multi-tenancy cloud or edge computing, to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria 10 GX FPGA Development board show that Systolic-CNN, when mapped with a single-precision data format, can achieve 100% utilization of the DSP block resource and an average inference latency of 10ms, 84ms, 1615ms, and 990ms per image for accelerating AlexNet, ResNet-50, RetinaNet, and Light-weight RetinaNet, respectively. The peak computational throughput is measured at 80-170 GFLOPS/s across the acceleration of different CNN models.

KeyWords
CNN Models,FPGA



Share
Share via WhatsApp
BE/BTech & ME/MTech Final Year Projects for Computer Science | Information Technology | ECE Engineer | IEEE Projects Topics, PHD Projects Reports, Ideas and Download | Sai Info Solution | Nashik |Pune |Mumbai
Call us : 09096813348 / 02536644344
Mail ID : developer.saiinfo@gmail.com
Skype ID : saiinfosolutionnashik