Teaching LLMs to Move: Generation of Human Activity Sensor Traces with BERT

Bigelli, L.; Bogliolo, A.; Calisti, L.; Contoli, C.; Kania, N.; Lattanzi, E.

Recent advances in large language models (LLMs) have demonstrated their strong ability to generate coherent and contextually relevant text. However, their application in generating sequential sensor data, particularly for human activity recognition, remains an underexplored task. In this study, we present a new method for generating synthetic sensor traces using a customized BERT (Bidirectional Encoder Representations from Transformers), which is a transformer-based LLM originally designed for natural language processing. BERT is trained from scratch on accelerometer data transcoded as structured sentences within a tokenized codebook that preserves temporal and spatial relationships. The trained model is then used to generate synthetic traces starting from a known context. We evaluated our method on benchmark datasets, demonstrating that the generated sensor sequences are properly recognized by state-of-the-art classification models, suggesting that synthetic data preserves the statistical and kinematic properties of real-world activities. The findings of this study indicate that a pre-trained LLM can be effectively used for the generation of sensor data, potentially helping to fill the great lack of labeled data that characterizes the domain of human activity recognition.