Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare, holding great potential for disease detection and drug testing. However, analysing large metabolomics datasets remains challenging, with available methods generally relying on limited and incompletely annotated biological pathways. This study introduces a novel approach that leverages machine learning classifiers trained on molecular fingerprints of metabolites, to predict their responses under specific experimental conditions. The model is evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. In this study, metabolite structures are encoded using the Morgan fingerprint, a well-established technique widely embraced in drug discovery. The suitability of this fingerprinting method, in generating unique structural encodings for detected metabolites, is analysed, and strategies to mitigate resolution limitations inherent to this fingerprint are introduced. Machine learning classifiers are trained on these fingerprints and exhibit satisfactory performance, providing evidence that the structural encoding holds predictive power over the metabolic response. Feature importance analysis, conducted on the best-performing models, identifies the chemical configu- rations that have the greatest influence to the classification process, shedding light on affected biological processes. Remarkably, this analysis not only identifies metabolites known to participate in affected pathways but also discovers metabolites not previously associated with the disease, opening up novel opportunities for further exploration. As an initial exploration of the proposed approach, this work lays the foundation for future research that leverages alternative structural encodings, diverse machine learning models, and explainability tools.
Machine Learning-Enabled Prediction of Metabolite Response in Genetic Disorders
Christel Sirocchi
;Federica Biancucci
;Matteo Donati;Riccardo Benedetti;Alessandro Bogliolo;Stefano Ferretti;Mauro Magnani;Michele Menotta;Muhammad Suffian;Sara Montagna
2023
Abstract
Metabolomics has emerged as a promising discipline in pharmaceuticals and preventive healthcare, holding great potential for disease detection and drug testing. However, analysing large metabolomics datasets remains challenging, with available methods generally relying on limited and incompletely annotated biological pathways. This study introduces a novel approach that leverages machine learning classifiers trained on molecular fingerprints of metabolites, to predict their responses under specific experimental conditions. The model is evaluated on mass spectrometry metabolomic data for a cellular model of the genetic disease Ataxia Telangiectasia. In this study, metabolite structures are encoded using the Morgan fingerprint, a well-established technique widely embraced in drug discovery. The suitability of this fingerprinting method, in generating unique structural encodings for detected metabolites, is analysed, and strategies to mitigate resolution limitations inherent to this fingerprint are introduced. Machine learning classifiers are trained on these fingerprints and exhibit satisfactory performance, providing evidence that the structural encoding holds predictive power over the metabolic response. Feature importance analysis, conducted on the best-performing models, identifies the chemical configu- rations that have the greatest influence to the classification process, shedding light on affected biological processes. Remarkably, this analysis not only identifies metabolites known to participate in affected pathways but also discovers metabolites not previously associated with the disease, opening up novel opportunities for further exploration. As an initial exploration of the proposed approach, this work lays the foundation for future research that leverages alternative structural encodings, diverse machine learning models, and explainability tools.File | Dimensione | Formato | |
---|---|---|---|
paper1.pdf
accesso aperto
Tipologia:
Versione editoriale
Licenza:
Creative commons
Dimensione
362.74 kB
Formato
Adobe PDF
|
362.74 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.