Self-supervised learning provides a scalable method for acquiring representations from unlabelled data, a capability that is essential in domains such as sensor-based Human Activity Recognition, where the labelling of signals can be prohibitively expensive. This paper introduces Temporal Distance-based Self-Supervised Learning (TD-SSL), a novel framework that exploits the temporal distance of accelerometer sequences to build contrastive data pairs used to construct a pretrained backbone for human activity recognition models. The method is based on the hypothesis that in a continuous sensor stream, temporally adjacent samples likely correspond to the same activity, while distant ones represent distinct actions. A study was conducted in which three distinct network architectures – convolutional, DenseNet-like, and multi-head attention neural networks – were evaluated on four publicly available datasets. The experimental evaluations demonstrate that TD-SSL consistently achieves the performance of supervised benchmarks, thereby confirming learning transferability. Furthermore, in scenarios where labeled data is scarce, it surpasses the supervised approach by more than 10 percentage points, significantly reducing the need for manual annotation in sensor-based human activity recognition.
Temporal Distance based Self-Supervised Learning for Human Activity Recognition
L. BigelliSoftware
;C. ContoliMethodology
;N. KaniaSoftware
;E. Lattanzi
Conceptualization
2025
Abstract
Self-supervised learning provides a scalable method for acquiring representations from unlabelled data, a capability that is essential in domains such as sensor-based Human Activity Recognition, where the labelling of signals can be prohibitively expensive. This paper introduces Temporal Distance-based Self-Supervised Learning (TD-SSL), a novel framework that exploits the temporal distance of accelerometer sequences to build contrastive data pairs used to construct a pretrained backbone for human activity recognition models. The method is based on the hypothesis that in a continuous sensor stream, temporally adjacent samples likely correspond to the same activity, while distant ones represent distinct actions. A study was conducted in which three distinct network architectures – convolutional, DenseNet-like, and multi-head attention neural networks – were evaluated on four publicly available datasets. The experimental evaluations demonstrate that TD-SSL consistently achieves the performance of supervised benchmarks, thereby confirming learning transferability. Furthermore, in scenarios where labeled data is scarce, it surpasses the supervised approach by more than 10 percentage points, significantly reducing the need for manual annotation in sensor-based human activity recognition.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


