Treball de Fi de Grau / Treball de Fi de Màster

Machine Learning for biomarker discovery: Applying ChemEmbed to a case-control study

Descripció

Background & Motivation: Metabolomics plays a crucial role in biomedical research and precision medicine, providing insights into disease mechanisms, diagnostics, and therapeutic targets. However, identifying biomarkers from complex mass spectrometry (MS/MS) data remains a significant challenge due to the high dimensionality, variability, and sparsity of spectral data. Recent advances in machine learning (ML) and deep learning have improved metabolite annotation, enabling more accurate identification of disease-related metabolic alterations. This project focuses on applying an enhanced version of ChemEmbed (https://www.biorxiv.org/content/10.1101/2025.02.07.637102v1), a deep learning framework that models chemical structures and MS/MS spectra, to a real-world case-control study. The aim is to identify potential biomarkers distinguishing health from disease, using an optimized AI-driven annotation workflow. Objective: The goal of this TFG project is to implement and validate the improved ChemEmbed model in a real case-control metabolomics study, using patient-derived MS/MS data to: - Improve metabolite annotation accuracy through an optimized deep learning model. - Identify significant metabolic differences between health and disease states. - Evaluate the biological relevance of annotated metabolites as potential biomarkers. Methodology: 1. Biomedical Data Preprocessing -Process patient-derived MS/MS datasets from a case-control study. -Apply signal preprocessing techniques (baseline correction, spectral binning, and intensity normalization). -Extract key spectral features relevant for biomarker identification. 2. Statistical & Bioinformatics Analysis -Compare annotated metabolite profiles between case and control groups. -Apply multivariate statistical techniques (PCA, PLS-DA) to identify significant metabolic differences. -Integrate pathway enrichment analysis to interpret the biological relevance of identified metabolites. 3. Validation & Interpretation -Assess the clinical significance of potential biomarkers. -Cross-check findings with biomedical literature and known disease markers. -Discuss the feasibility of using ChemEmbed for clinical metabolomics applications. Expected Outcomes: -Validated deep learning pipeline for metabolite annotation in a real-world biomedical study. -Identification of disease-related metabolic alterations with potential diagnostic or therapeutic implications. -Improved biomarker discovery workflow, integrating AI-driven annotation with bioinformatics analysis. Student Profile & Skills -Background in biomedical engineering, bioinformatics, machine learning, or metabolomics. -Experience with Python, deep learning frameworks (TensorFlow/PyTorch), and statistical analysis is desirable. -Interest in biomedical applications of AI, biomarker discovery, and mass spectrometry data analysis. Supervision & Work Plan Supervisor: Dr. Oscar Yanes Milestones: -Literature review & data preprocessing -Model implementation & biomarker annotation -Statistical & biological interpretation -Documentation & final report This project offers an exciting opportunity to work at the intersection of biomedical engineering, AI, and metabolomics, contributing to real-world disease research through advanced computational techniques.

Ensenyaments

Doble titulació de Grau en Enginyeria Biomèdica i en Enginyeria de Sistemes i Serveis de Telecomunicacions (GEB)

Tema

Ciències òmiques i medicina personalitzada

Estat

Finalitzat

Data Proposta

2025-03-10

Directors

Óscar Yanes Torrado

Alumnes

IVÁN PÉREZ LÓPEZ

Recomanacions

Basic knowledge in the programming languages Python and R

Dificultat

Mitjana

Empresa

Confidencial

Anglès

Aprenentatge Servei

Fitxer	Descripció
Memoria_PÉREZLÓPEZ_IVÁN.pdf

Machine Learning for biomarker discovery: Applying ChemEmbed to a case-control study

Fitxers