F-E3D: FPGA-based Acceleration of an Efficient 3D Convolutional Neural Network for Human Action Recognition

Abstract

Three-dimensional convolutional neural networks (3D CNNs) have demonstrated their outstanding classification accuracy for human action recognition (HAR). However, the large number of computations and parameters in 3D CNNs limits their deployability in real-life applications. To address this challenge, this paper adopts an algorithm-hardware co-design method by proposing an efficient 3D CNN building unit called 3D-1 bottleneck residual block (3D-1 BRB) at the algorithm level, and a corresponding FPGA-based hardware architecture called F-E3D at the hardware level. Based on 3D-1 BRB, a novel 3D CNN model called E3DNet is developed, which achieves nearly 37 times reduction in model size and 5% improvement in accuracy compared to standard 3D CNNs on the UCF101 dataset. Together with several hardware optimizations, including 3D fused BRB, online blocking and kernel reuse, the proposed F-E3D is nearly 13 times faster than a previous FPGA design for 3D CNNs, with performance and accuracy comparable to other state-of-the-art 3D CNN models on GPU platforms while requiring only 7% of their energy consumption.

Publication
2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)