Given the significant improvement in flow cytometry technologies, a massive dataset has been acquired, measuring a larger number of cellular markers. Managing and analysing these data by hand, based on the sole expertise of human operators, is no longer feasible. Recent literature suggests exploiting machine learning algorithms to automate data analysis for extracting and counting sub-populations of cells and supporting diagnosis. In this paper, we applied Support Vector Machine, XGBoost, Decision Tree, Logistic Regression, and Multi-Layer Perceptron to identify three specific cellular types: Lymphocyte T, B, and T cytotoxic. Performances are promising across all models and experiments, with a balanced accuracy above 0.85. Moreover, when looking at the Recall and the F1 score, the Decision Tree is the unique model with values below 0.8 in the classification B Lymphocyte. Moreover, to improve the interpretability of the trained models, we computed the SHAP-based explanations for the XGBoost, Decision Tree and Multi-Layer Perceptron, obtaining a set of extracted features that domain experts recognised as significant for the three classification tasks, thus emphasising the viability of this approach in automating the gating process in flow citometry.
Interpretable Machine Learning for Automated Cellular Population Analysis in Flow Cytometry
Benedetti, Riccardo;Suffian, Muhammad
;Bogliolo, Alessandro;Canonico, Barbara;Papa, Stefano;Ortolani, Claudio;Montagna, Sara
2025
Abstract
Given the significant improvement in flow cytometry technologies, a massive dataset has been acquired, measuring a larger number of cellular markers. Managing and analysing these data by hand, based on the sole expertise of human operators, is no longer feasible. Recent literature suggests exploiting machine learning algorithms to automate data analysis for extracting and counting sub-populations of cells and supporting diagnosis. In this paper, we applied Support Vector Machine, XGBoost, Decision Tree, Logistic Regression, and Multi-Layer Perceptron to identify three specific cellular types: Lymphocyte T, B, and T cytotoxic. Performances are promising across all models and experiments, with a balanced accuracy above 0.85. Moreover, when looking at the Recall and the F1 score, the Decision Tree is the unique model with values below 0.8 in the classification B Lymphocyte. Moreover, to improve the interpretability of the trained models, we computed the SHAP-based explanations for the XGBoost, Decision Tree and Multi-Layer Perceptron, obtaining a set of extracted features that domain experts recognised as significant for the three classification tasks, thus emphasising the viability of this approach in automating the gating process in flow citometry.File | Dimensione | Formato | |
---|---|---|---|
978-3-031-90714-2_17.pdf
solo utenti autorizzati
Tipologia:
Versione editoriale
Licenza:
Copyright dell'editore
Dimensione
2.41 MB
Formato
Adobe PDF
|
2.41 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.