Given the significant improvement in flow cytometry technologies, a massive dataset has been acquired, measuring a larger number of cellular markers. Managing and analysing these data by hand, based on the sole expertise of human operators, is no longer feasible. Recent literature suggests exploiting machine learning algorithms to automate data analysis for extracting and counting sub-populations of cells and supporting diagnosis. In this paper, we applied Support Vector Machine, XGBoost, Decision Tree, Logistic Regression, and Multi-Layer Perceptron to identify three specific cellular types: Lymphocyte T, B, and T cytotoxic. Performances are promising across all models and experiments, with a balanced accuracy above 0.85. Moreover, when looking at the Recall and the F1 score, the Decision Tree is the unique model with values below 0.8 in the classification B Lymphocyte. Moreover, to improve the interpretability of the trained models, we computed the SHAP-based explanations for the XGBoost, Decision Tree and Multi-Layer Perceptron, obtaining a set of extracted features that domain experts recognised as significant for the three classification tasks, thus emphasising the viability of this approach in automating the gating process in flow citometry.

Interpretable Machine Learning for Automated Cellular Population Analysis in Flow Cytometry

Benedetti, Riccardo;Suffian, Muhammad
;
Bogliolo, Alessandro;Canonico, Barbara;Papa, Stefano;Ortolani, Claudio;Montagna, Sara
2025

Abstract

Given the significant improvement in flow cytometry technologies, a massive dataset has been acquired, measuring a larger number of cellular markers. Managing and analysing these data by hand, based on the sole expertise of human operators, is no longer feasible. Recent literature suggests exploiting machine learning algorithms to automate data analysis for extracting and counting sub-populations of cells and supporting diagnosis. In this paper, we applied Support Vector Machine, XGBoost, Decision Tree, Logistic Regression, and Multi-Layer Perceptron to identify three specific cellular types: Lymphocyte T, B, and T cytotoxic. Performances are promising across all models and experiments, with a balanced accuracy above 0.85. Moreover, when looking at the Recall and the F1 score, the Decision Tree is the unique model with values below 0.8 in the classification B Lymphocyte. Moreover, to improve the interpretability of the trained models, we computed the SHAP-based explanations for the XGBoost, Decision Tree and Multi-Layer Perceptron, obtaining a set of extracted features that domain experts recognised as significant for the three classification tasks, thus emphasising the viability of this approach in automating the gating process in flow citometry.
2025
978-3-031-90714-2
File in questo prodotto:
File Dimensione Formato  
978-3-031-90714-2_17.pdf

solo utenti autorizzati

Tipologia: Versione editoriale
Licenza: Copyright dell'editore
Dimensione 2.41 MB
Formato Adobe PDF
2.41 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2755411
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact