This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.
A Methodology Combining NLP, Machine Learning, and LLM for the Detection of Phishing Emails
Alessandro Aldini
;
2026
Abstract
This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


