In the human activity recognition (HAR) application domain, the use of deep learning (DL) algorithms for feature extractions and training purposes delivers significant performance improvements with respect to the use of traditional machine learning (ML) algorithms. However, this comes at the expense of more complex and demanding models, making harder their deployment on constrained devices traditionally involved in the HAR process. The efficiency of DL deployment is thus yet to be explored. We thoroughly investigated the application of TensorFlow Lite simple conversion, dynamic, and full integer quantization compression techniques. We applied those techniques not only to convolutional neural networks (CNNs), but also to long short-term memory (LSTM) networks, and a combined version of CNN and LSTM. We also considered two use case scenarios, namely cascading compression and stand-alone compression mode. This paper reports the feasibility of deploying deep networks onto an ESP32 device, and how TensorFlow compression techniques impact classification accuracy, energy consumption, and inference latency. Results show that in the cascading case, it is not possible to carry out the performance characterization. Whereas in the stand-alone case, dynamic quantization is recommended because yields a negligible loss of accuracy. In terms of power efficiency, both dynamic and full integer quantization provide high energy saving with respect to the uncompressed models: between 31% and 37% for CNN networks, and up to 45% for LSTM networks. In terms of inference latency, dynamic and full integer quantization provide comparable performance.

A Study on the Application of TensorFlow Compression Techniques to Human Activity Recognition

Chiara Contoli
;
Emanuele Lattanzi
2023

Abstract

In the human activity recognition (HAR) application domain, the use of deep learning (DL) algorithms for feature extractions and training purposes delivers significant performance improvements with respect to the use of traditional machine learning (ML) algorithms. However, this comes at the expense of more complex and demanding models, making harder their deployment on constrained devices traditionally involved in the HAR process. The efficiency of DL deployment is thus yet to be explored. We thoroughly investigated the application of TensorFlow Lite simple conversion, dynamic, and full integer quantization compression techniques. We applied those techniques not only to convolutional neural networks (CNNs), but also to long short-term memory (LSTM) networks, and a combined version of CNN and LSTM. We also considered two use case scenarios, namely cascading compression and stand-alone compression mode. This paper reports the feasibility of deploying deep networks onto an ESP32 device, and how TensorFlow compression techniques impact classification accuracy, energy consumption, and inference latency. Results show that in the cascading case, it is not possible to carry out the performance characterization. Whereas in the stand-alone case, dynamic quantization is recommended because yields a negligible loss of accuracy. In terms of power efficiency, both dynamic and full integer quantization provide high energy saving with respect to the uncompressed models: between 31% and 37% for CNN networks, and up to 45% for LSTM networks. In terms of inference latency, dynamic and full integer quantization provide comparable performance.
File in questo prodotto:
File Dimensione Formato  
FinalArticle.pdf

accesso aperto

Tipologia: Versione editoriale
Licenza: Creative commons
Dimensione 1.05 MB
Formato Adobe PDF
1.05 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2715491
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact