Contrastive Learning for Quark-Gluon Jet Classification.

Self-supervised transformer to generate embeddings for particle jets and then classify them using non-linear models.

Supervisor: Dr. Tilman Plehn, Dr. Barry Dillion

We adopt the JetCLR framework, a self-supervised contrastive learning approach, to the task of quark-gluon jet tagging. Initially tested on top-tagging, JetCLR is here applied to a quark-gluon dataset that includes particle-ID (PID) information alongside standard kinematic data \((pT, η, φ)\). The primary goal is to evaluate the performance of JetCLR-generated representations for linear quark/gluon discrimination. This involves comparing these representations against established alternatives, specifically Energy Flow Polynomials (EFPs), using various linear classifier tests (LCTs). The study investigates different JetCLR configurations, including variations with and without PID information, and explores different methods for encoding PIDs, such as single float values (PFN-ID, PFN-Ex, JetCLR-ID) and one-hot encoding.

Initial results indicate that JetCLR representations, particularly when incorporating one-hot encoded PIDs show promising performance, achieving comparable or slightly better results than standard EFPs depending on the linear classfier used (e.g., outperforming EFPs with BCE loss but underperforming with Linear Discriminant Analysis). The quark-gluon dataset was generated using Pythia 8.226 for pp collisions at 14 TeV, focusing on Z+jet events within a pT range of 500-550 GeV. Further investigations include optimizing hyperparameters like the contrastive loss temperature and assessing the impact of different data augmentations and potential detector effects on the classification performance.

View Code Report