Recent advances in large language models (LLMs) have demonstrated their strong ability to generate coherent and contextually relevant text. However, their application in generating sequential sensor data, particularly for human activity recognition, remains an underexplored task. In this study, we present a new method for generating synthetic sensor traces using a customized BERT (Bidirectional Encoder Representations from Transformers), which is a transformer-based LLM originally designed for natural language processing. BERT is trained from scratch on accelerometer data transcoded as structured sentences within a tokenized codebook that preserves temporal and spatial relationships. The trained model is then used to generate synthetic traces starting from a known context. We evaluated our method on benchmark datasets, demonstrating that the generated sensor sequences are properly recognized by state-of-the-art classification models, suggesting that synthetic data preserves the statistical and kinematic properties of real-world activities. The findings of this study indicate that a pre-trained LLM can be effectively used for the generation of sensor data, potentially helping to fill the great lack of labeled data that characterizes the domain of human activity recognition.

Teaching LLMs to Move: Generation of Human Activity Sensor Traces with BERT

L. Bigelli
Software
;
A. Bogliolo
Methodology
;
L. Calisti
Methodology
;
C. Contoli
Methodology
;
N. Kania
Data Curation
;
E. Lattanzi
Conceptualization
In corso di stampa

Abstract

Recent advances in large language models (LLMs) have demonstrated their strong ability to generate coherent and contextually relevant text. However, their application in generating sequential sensor data, particularly for human activity recognition, remains an underexplored task. In this study, we present a new method for generating synthetic sensor traces using a customized BERT (Bidirectional Encoder Representations from Transformers), which is a transformer-based LLM originally designed for natural language processing. BERT is trained from scratch on accelerometer data transcoded as structured sentences within a tokenized codebook that preserves temporal and spatial relationships. The trained model is then used to generate synthetic traces starting from a known context. We evaluated our method on benchmark datasets, demonstrating that the generated sensor sequences are properly recognized by state-of-the-art classification models, suggesting that synthetic data preserves the statistical and kinematic properties of real-world activities. The findings of this study indicate that a pre-trained LLM can be effectively used for the generation of sensor data, potentially helping to fill the great lack of labeled data that characterizes the domain of human activity recognition.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2762134
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact