This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.

A Methodology Combining NLP, Machine Learning, and LLM for the Detection of Phishing Emails

Alessandro Aldini
;
2026

Abstract

This work explores the combined use of various AI-based methods for detecting and classifying phishing emails. Specifically, we consider Random Forest, Support Vector Machine, and XGBoost machine learning (ML) algorithms, alongside natural language processing (NLP) techniques, and the large language model (LLM) Gemini. We combine all these methods in a pipeline and, through an empirical analysis based on a publicly available, balanced dataset, we compare the various techniques and emphasize the potential advantages and disagreements arising from the integration of ML and LLM.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2774071
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact