Accelerated Low Power AI System for Indian Sign Language Recognition

  • Jella Sandhya Sandhya
  • Dr. Anitha Sheela Kancharla
Keywords: FPGA accelerator, Data quantization, optimization, Xilinx Ultra Scale ZCU102

Abstract

Deep Convolutional Neural Network (CNN) based methods have become more powerful for wide variety of applications particularly in Natural Language Processing and Computer vision. Nevertheless, the CNN-based methods are computational expensive and more resource-hungry, and hence are becoming difficult to implement on battery operated devices like smart phones, AR/VR glasses, Autonomous Robots etc. Also with the increasing complexity of deep learning models like ResNet-50, there is a growing demand for efficient hardware accelerators to handle the computational workload. In this paper, we present the design and implementation of a neural network accelerator tailored for ResNet-50 on the ZCU102 platform using Field-Programmable Gate Arrays (FPGAs) which offers and customizable solution to address this challenge. We systematically investigate the design choices and optimization strategies for deploying custom built ResNet-50 network trained for Indian Sign language translation of 76 gestures enacted and build in our labs for Doctor patient interface on FPGA-based accelerators. In order to enhance operational speed, we have employed various techniques, including parallelism and pipelining, leveraging Depthwise Separable Convolution. Furthermore, we have implemented hierarchical memory allocation for different offsets using threads. Additionally, we have utilized weight and data quantization to optimize operational speed while minimizing resource consumption, thus achieving low power consumption while maintaining acceptable levels of inference accuracy. We, evaluated our accelerated FPGA model against CPU interms of various performance metrics viz: frames per second (fps), Memory allocations, LUTs, DSPs and Block RAMs used. Our findings underscore the superiority of FPGA-based accelerators, as evidenced by achieving a frame rate of 2.7fps on the Xilinx Ultra Scale ZCU102 platform with int8 quantization, compared to 0.8fps for Single precision. In contrast, the CPU achieved a frame rate of 0.6fps. Notably, we observed a minimal accuracy variation of only 1.37% with int8 quantization, while no accuracy variation was observed for Single precision. Our implementation utilized 16 convolution threads and 4 FC threads operating at 200 MHz for single precision, whereas for int8, we employed 25 convolution threads and 16 FC threads operating at 250 MHz.

References

[1] Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International journal of computer vision 115 (2015): 211-252.
[2] Zhang, Chen, et al. "Optimizing FPGA-based accelerator design for deep convolutional neural networks." Proceedings of the 2015 ACM/SI GDA international symposium on field-programmable gate arrays. 2015.
[3] Chen, Tianshi, et al. "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning." ACM SIGARCH Computer Architecture News 42.1 (2014): 269-284.
[4] Chen, Yunji, et al. "Dadiannao: A machine-learning supercomputer." 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 2014.
[5] Liu, Daofu, et al. "Pudiannao: A polyvalent machine learning accelerator." ACM SIGARCH Computer Architecture News 43.1 (2015): 369-381..
[6] Du, Zidong, et al. "ShiDianNao: Shifting vision processing closer to the sensor." Proceedings of the 42nd annual international symposium on computer architecture. 2015.
[7] Chakradhar, Srimat, et al. "A dynamically configurable coprocessor for convolutional neural networks." Proceedings of the 37th annual international symposium on Computer architecture. 2010.
[8] Farabet, Clément, et al. "Neuflow: A runtime reconfigurable dataflow processor for vision." CVPR 2011 workshops. IEEE, 2011.
[9] Farabet, Clément, et al. "Cnp: An fpga-based processor for convolutional networks." 2009 International Conference on Field Programmable Logic and Applications. IEEE, 2009.
[10] Jella Sandhya, KANCHARLA ANITHASHEELA. Spatiotemporal Modeling for Dynamic Gesture Recognition in Video Streams, 08 March 2024, PREPRINT (Version 1)available at Research Square [https://doi.org/10.21203/rs.3.rs-4019650/v1]
[11] Qiu, Jiantao, et al. "Going deeper with embedded FPGA platform for convolutional neural network." Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. 2016
[12] Nurvitadhi, Eriko, et al. "Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC." 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 2016.
[13] Dhilleswararao, Pudi, et al. "Efficient hardware architectures for accelerating deep neural networks: Survey." IEEE access 10 (2022): 131788-131828.
[14] Bai, Lin, Yiming Zhao, and Xinming Huang. "A CNN accelerator on FPGA using depthwise separable convolution." IEEE Transactions on Circuits and Systems II: Express Briefs 65.10 (2018): 1415-1419.
[15] Farabet, Clément, et al. "Large-scale FPGA-based convolutional networks." Scaling up machine learning: parallel and distributed approaches 13.3 (2011): 399-419.

[16] Sankaradas, Murugan, et al. "A massively parallel coprocessor for convolutional neural networks." 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. IEEE, 2009.
[17] Larkin, Daniel, Andrew Kinane, and Noel O’Connor. "Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices." Neural Information Processing: 13th International Conference, ICONIP 2006, Hong Kong, China, October 3-6, 2006. Proceedings, Part III 13. Springer Berlin Heidelberg, 2006.
[18] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324
Published
2023-12-30
How to Cite
Sandhya, J. S., & Kancharla, D. A. S. (2023). Accelerated Low Power AI System for Indian Sign Language Recognition. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 9(3), 103-108. https://doi.org/10.33130/AJCT.2023v09i03.017

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.