Treball de Fi de Grau / Treball de Fi de Màster

Performance Analysis and Acceleration of Learned Hash-Indexes on Supercomputing Architectures

Descripció

Recent advances in sequencing technologies have made population-scale genome analysis a reality, driving advances in modern biomedical research and healthcare. At this scale, genomic data analyses enable the discovery of valuable biological insights such as novel genetic variations, gene expression patterns, genes, and regulatory elements. However, the exponential growth of genomic data poses significant challenges for the performance scalability of genome analysis tools that require accessing large sequence databases. For instance, mapping long DNA sequences to a large reference genome is one of the most time-consuming steps in many genome sequencing analyses. In particular, seeding algorithms, which locate short DNA fragments in a reference genome, often become a major performance bottleneck in genome mapping tools. For that, many performance-critical tools rely on optimized hash-tables to perform fast database lookups of DNA sequences. Despite their well-known efficiency and performance, hash-table-based tools suffer from irregular memory access patterns and limited spatial locality, making them a poor fit for modern hardware. The recently proposed learned index strategies have shown promise in accelerating traditional data structures, such as hash tables, by leveraging machine-learned models such as RMI to predict the location of keys and, thereby, reduce the number of memory accesses. Notwithstanding, the performance of learned hash-tables remains constrained by low instruction-level parallelism, poor cache locality, and underutilized memory bandwidth. This thesis proposes to analyse and characterize the performance of learned hash-tables, identify their bottlenecks and limitations, and explore software and hardware acceleration strategies to unlock their full potential on modern high-performance architectures.

Ensenyaments

Doble Titulació de Grau d'Enginyeria Informàtica i Biotecnologia (GEI)

Tema

Altres

Estat

Finalitzat

Data Proposta

2025-04-11

Directors

Carlos Molina Clemente

Alumnes

MONTSERRAT PALAZON BALMASEDA

Recomanacions

Dificultat

Alta

Empresa

Confidencial

Anglès

Aprenentatge Servei

Fitxer	Descripció
Memoria_PALAZONBALMASEDA_MONTSERRAT.pdf	Memòria del TFG

Performance Analysis and Acceleration of Learned Hash-Indexes on Supercomputing Architectures

Fitxers