A Methodology Combining NLP, Machine Learning, and LLM for the Detection of Phishing Emails

IRIS

This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.

A Methodology Combining NLP, Machine Learning, and LLM for the Detection of Phishing Emails

Alessandro Aldini;Danilo Rosati

2026

Abstract

This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

4.1 Contributo Atti di Convegno (Proceeding)

File in questo prodotto:

File	Dimensione	Formato
paper1.pdf accesso aperto Tipologia: Versione editoriale Licenza: Creative commons Dimensione 1.33 MB Formato Adobe PDF Visualizza/Apri	1.33 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2774071

Citazioni

ND

ND

ND

social impact