Chronic disease management requires continuous monitoring, lifestyle modification and therapy adherence, thus requiring constant support from healthcare professionals. Chatbots have proven to be a promising approach for engaging patients in managing their health condition at home and for offering continuous assistance by being readily available to answer questions. While large language models offer an impressive solution for chatbot implementation, third-party systems raise privacy concerns, and computational requirements limit small-scale deployment. We address these challenges by developing a chatbot for hypertensive patients based on open-source small language models (SLMs), specifically designed for running on personal resource-constrained devices and for providing assistance in QA tasks. In order to guarantee comparable conversational performances with respect to larger language models, we exploited retrieval-augmented generation (RAG) with a local knowledge base. This ensures data privacy by deploying models locally while achieving competitive accuracy and maintaining low computational costs suitable for end-user devices. We experimented with eight SLMs, two prompt configurations, and different RAG strategies – both in the embedding and retrieval components – to identify the most effective solution. The evaluation of our solution grounds on both reference metrics and expert evaluation. Our findings suggest that RAG-enhanced SLMs can improve response clarity and content accuracy. However, our results also indicate that newer SLMs like Qwen3 demonstrate strong performance even without RAG, suggesting a potential shift in the necessity for complex retrieval mechanisms with rapidly evolving model architectures.
RAG-Enhanced Open SLMs for Hypertension Management Chatbots
Farahmand, Aqila;Ferretti, Stefano;Montagna, Sara
2025
Abstract
Chronic disease management requires continuous monitoring, lifestyle modification and therapy adherence, thus requiring constant support from healthcare professionals. Chatbots have proven to be a promising approach for engaging patients in managing their health condition at home and for offering continuous assistance by being readily available to answer questions. While large language models offer an impressive solution for chatbot implementation, third-party systems raise privacy concerns, and computational requirements limit small-scale deployment. We address these challenges by developing a chatbot for hypertensive patients based on open-source small language models (SLMs), specifically designed for running on personal resource-constrained devices and for providing assistance in QA tasks. In order to guarantee comparable conversational performances with respect to larger language models, we exploited retrieval-augmented generation (RAG) with a local knowledge base. This ensures data privacy by deploying models locally while achieving competitive accuracy and maintaining low computational costs suitable for end-user devices. We experimented with eight SLMs, two prompt configurations, and different RAG strategies – both in the embedding and retrieval components – to identify the most effective solution. The evaluation of our solution grounds on both reference metrics and expert evaluation. Our findings suggest that RAG-enhanced SLMs can improve response clarity and content accuracy. However, our results also indicate that newer SLMs like Qwen3 demonstrate strong performance even without RAG, suggesting a potential shift in the necessity for complex retrieval mechanisms with rapidly evolving model architectures.| File | Dimensione | Formato | |
|---|---|---|---|
|
s10916-025-02297-7.pdf
accesso aperto
Tipologia:
Versione editoriale
Licenza:
Creative commons
Dimensione
1.98 MB
Formato
Adobe PDF
|
1.98 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


