Metabolomics has emerged as a promising field in pharmaceuticals and preventive healthcare, offering practical applications in disease detection and drug testing. However, the analysis and interpretation of complex metabolic datasets remain challenging, with current methods relying heavily on limited and incompletely annotated biological pathways. To overcome these limitations, we propose a novel approach that involves training machine learning classifiers on fingerprint-based encodings of metabolites to predict their response under specific experimental conditions. In this study, we evaluate our approach using a cellular model for the genetic disease Ataxia Telangiectasia (AT). Remarkably, some of our trained models predict affected metabolites with good performance, providing compelling evidence that the structural properties of metabolites hold predictive power over their response to specific conditions. Additionally, we suggest that evaluating the feature importance of the model can greatly assist researchers in identifying clusters of significant molecules and formulating hypotheses about affected pathways. Notably, our analysis of the AT cellular model identifies distinct groups of metabolites, some of which were already known to participate in the affected pathways, thereby validating existing knowledge. Moreover, we discovered metabolites not previously associated with AT, opening up novel opportunities for further exploration.

Molecular Fingerprints-Based Machine Learning for Metabolic Profiling

Sirocchi, Christel;Biancucci, Federica;Suffian, Muhammad;Benedetti, Riccardo;Donati, Matteo;Ferretti, Stefano;Bogliolo, Alessandro;Magnani, Mauro;Menotta, Michele;Montagna, Sara
2025

Abstract

Metabolomics has emerged as a promising field in pharmaceuticals and preventive healthcare, offering practical applications in disease detection and drug testing. However, the analysis and interpretation of complex metabolic datasets remain challenging, with current methods relying heavily on limited and incompletely annotated biological pathways. To overcome these limitations, we propose a novel approach that involves training machine learning classifiers on fingerprint-based encodings of metabolites to predict their response under specific experimental conditions. In this study, we evaluate our approach using a cellular model for the genetic disease Ataxia Telangiectasia (AT). Remarkably, some of our trained models predict affected metabolites with good performance, providing compelling evidence that the structural properties of metabolites hold predictive power over their response to specific conditions. Additionally, we suggest that evaluating the feature importance of the model can greatly assist researchers in identifying clusters of significant molecules and formulating hypotheses about affected pathways. Notably, our analysis of the AT cellular model identifies distinct groups of metabolites, some of which were already known to participate in the affected pathways, thereby validating existing knowledge. Moreover, we discovered metabolites not previously associated with AT, opening up novel opportunities for further exploration.
2025
978-3-031-74640-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11576/2750631
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact