Medical chatbots are becoming essential components of telemedicine applications as tools to assist patients in the self-management of their conditions. This trend is particularly driven by advancements in natural language processing techniques with pre-trained language models (LMs). However, the integration of LMs into clinical environments faces challenges related to reliability and privacy concerns. This study seeks to address these issues by exploiting a privacy by design architectural solution that utilises the fully local deployment of open-source LMs. Specifically, to mitigate any risk of information leakage, we focus on evaluating the performance of open-source language models (SLMs) that can be deployed on personal devices, such as smartphones or laptops, without stringent hardware requirements. We assess the effectiveness of this solution adopting hypertension management as a case study. Models are evaluated across various tasks, including intent recognition and empathetic conversation, using Gemini Pro 1.5 as a benchmark. The results indicate that, for certain tasks such as intent recognition, Gemini outperforms other models. However, by employing the “large language model (LLM) as a judge” approach for semantic evaluation of response correctness, we found several models that demonstrate a close alignment with the ground truth. In conclusion, this study highlights the potential of locally deployed SLM as components of medical chatbots, while addressing critical concerns related to privacy and reliability.

Open-source small language models for personal medical assistant chatbots

Sara Montagna
Conceptualization
2025

Abstract

Medical chatbots are becoming essential components of telemedicine applications as tools to assist patients in the self-management of their conditions. This trend is particularly driven by advancements in natural language processing techniques with pre-trained language models (LMs). However, the integration of LMs into clinical environments faces challenges related to reliability and privacy concerns. This study seeks to address these issues by exploiting a privacy by design architectural solution that utilises the fully local deployment of open-source LMs. Specifically, to mitigate any risk of information leakage, we focus on evaluating the performance of open-source language models (SLMs) that can be deployed on personal devices, such as smartphones or laptops, without stringent hardware requirements. We assess the effectiveness of this solution adopting hypertension management as a case study. Models are evaluated across various tasks, including intent recognition and empathetic conversation, using Gemini Pro 1.5 as a benchmark. The results indicate that, for certain tasks such as intent recognition, Gemini outperforms other models. However, by employing the “large language model (LLM) as a judge” approach for semantic evaluation of response correctness, we found several models that demonstrate a close alignment with the ground truth. In conclusion, this study highlights the potential of locally deployed SLM as components of medical chatbots, while addressing critical concerns related to privacy and reliability.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2749051
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact