Background & Motivation: Metabolomics plays a crucial role in biomedical research and precision medicine, providing insights into disease mechanisms, diagnostics, and therapeutic targets. However, identifying biomarkers from complex mass spectrometry (MS/MS) data remains a significant challenge due to the high dimensionality, variability, and sparsity of spectral data. Recent advances in machine learning (ML) and deep learning have improved metabolite annotation, enabling more accurate identification of disease-related metabolic alterations. This project focuses on applying an enhanced version of ChemEmbed (https://www.biorxiv.org/content/10.1101/2025.02.07.637102v1), a deep learning framework that models chemical structures and MS/MS spectra, to a real-world case-control study. The aim is to identify potential biomarkers distinguishing health from disease, using an optimized AI-driven annotation workflow. Objective: The goal of this TFG project is to implement and validate the improved ChemEmbed model in a real case-control metabolomics study, using patient-derived MS/MS data to: - Improve metabolite annotation accuracy through an optimized deep learning model. - Identify significant metabolic differences between health and disease states. - Evaluate the biological relevance of annotated metabolites as potential biomarkers. Methodology: 1. Biomedical Data Preprocessing -Process patient-derived MS/MS datasets from a case-control study. -Apply signal preprocessing techniques (baseline correction, spectral binning, and intensity normalization). -Extract key spectral features relevant for biomarker identification. 2. Statistical & Bioinformatics Analysis -Compare annotated metabolite profiles between case and control groups. -Apply multivariate statistical techniques (PCA, PLS-DA) to identify significant metabolic differences. -Integrate pathway enrichment analysis to interpret the biological relevance of identified metabolites. 3. Validation & Interpretation -Assess the clinical significance of potential biomarkers. -Cross-check findings with biomedical literature and known disease markers. -Discuss the feasibility of using ChemEmbed for clinical metabolomics applications. Expected Outcomes: -Validated deep learning pipeline for metabolite annotation in a real-world biomedical study. -Identification of disease-related metabolic alterations with potential diagnostic or therapeutic implications. -Improved biomarker discovery workflow, integrating AI-driven annotation with bioinformatics analysis. Student Profile & Skills -Background in biomedical engineering, bioinformatics, machine learning, or metabolomics. -Experience with Python, deep learning frameworks (TensorFlow/PyTorch), and statistical analysis is desirable. -Interest in biomedical applications of AI, biomarker discovery, and mass spectrometry data analysis. Supervision & Work Plan Supervisor: Dr. Oscar Yanes Milestones: -Literature review & data preprocessing -Model implementation & biomarker annotation -Statistical & biological interpretation -Documentation & final report This project offers an exciting opportunity to work at the intersection of biomedical engineering, AI, and metabolomics, contributing to real-world disease research through advanced computational techniques.
Doble titulació de Grau en Enginyeria Biomèdica i en Enginyeria de Sistemes i Serveis de Telecomunicacions (GEB)
Ciències òmiques i medicina personalitzada
En Curs
2025-03-10
Oscar Yanes Torrado
IVÁN PÉREZ LÓPEZ
Basic knowledge in the programming languages Python and R
Mitjana
No
No
No
No