Self-supervised learning provides a scalable method for acquiring representations from unlabelled data, a capability that is essential in domains such as sensor-based Human Activity Recognition, where the labelling of signals can be prohibitively expensive. This paper introduces Temporal Distance-based Self-Supervised Learning (TD-SSL), a novel framework that exploits the temporal distance of accelerometer sequences to build contrastive data pairs used to construct a pretrained backbone for human activity recognition models. The method is based on the hypothesis that in a continuous sensor stream, temporally adjacent samples likely correspond to the same activity, while distant ones represent distinct actions. A study was conducted in which three distinct network architectures – convolutional, DenseNet-like, and multi-head attention neural networks – were evaluated on four publicly available datasets. The experimental evaluations demonstrate that TD-SSL consistently achieves the performance of supervised benchmarks, thereby confirming learning transferability. Furthermore, in scenarios where labeled data is scarce, it surpasses the supervised approach by more than 10 percentage points, significantly reducing the need for manual annotation in sensor-based human activity recognition.

Temporal Distance based Self-Supervised Learning for Human Activity Recognition

L. Bigelli
Software
;
C. Contoli
Methodology
;
N. Kania
Software
;
E. Lattanzi
Conceptualization
2025

Abstract

Self-supervised learning provides a scalable method for acquiring representations from unlabelled data, a capability that is essential in domains such as sensor-based Human Activity Recognition, where the labelling of signals can be prohibitively expensive. This paper introduces Temporal Distance-based Self-Supervised Learning (TD-SSL), a novel framework that exploits the temporal distance of accelerometer sequences to build contrastive data pairs used to construct a pretrained backbone for human activity recognition models. The method is based on the hypothesis that in a continuous sensor stream, temporally adjacent samples likely correspond to the same activity, while distant ones represent distinct actions. A study was conducted in which three distinct network architectures – convolutional, DenseNet-like, and multi-head attention neural networks – were evaluated on four publicly available datasets. The experimental evaluations demonstrate that TD-SSL consistently achieves the performance of supervised benchmarks, thereby confirming learning transferability. Furthermore, in scenarios where labeled data is scarce, it surpasses the supervised approach by more than 10 percentage points, significantly reducing the need for manual annotation in sensor-based human activity recognition.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2762135
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact