Skip to content

Commit

Permalink
Update proposal_ML4DQM1.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ereinha authored Mar 18, 2024
1 parent 2003d53 commit 9ab193d
Showing 1 changed file with 8 additions and 10 deletions.
18 changes: 8 additions & 10 deletions _gsocproposals/2024/proposal_ML4DQM1.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Continuous learning for data quality monitoring at particle colliders
title: Continuous learning for high-energy physics data quality monitoring
layout: gsoc_proposal
project: ML4DQM
year: 2024
Expand All @@ -11,27 +11,25 @@ organization:
## Description


One key challenge in models currently used in Machine Learning do Data Quality Monitoring is that such models are often limited in their transferability to other systems, meaning each system or sub-detector within large HEP detectors need to have individual models developed, tested, and deployed, which can take significant time and effort. Furthermore, as detectors age, the data they produce can exhibit expected variations, potentially leading to misclassification as ‘bad data’ when ML models performing DQM have been trained on pristine detector data.
This proposal seeks to address these challenges by pioneering continuous learning ML models that leverage ensemble learning techniques that are collectively able to adapt to both changing detector conditions, as well as changing detector systems.

A key challenge in data quality monitoring in high-energy physics is the need for online monitoring and control of the experiment with the data that is sensitive to underlying conditions and the constantly evolving state of the detector components. Machine learning models can be useful in identifying anomalies in the data and monitoring the quality of the data. At the same time, continuous learning techniques may be necessary to avoid machine learning model sensitivity to changing data inputs, avoiding the need to frequently re-train models. This proposal seeks to address this challenge by exploring continuous learning models capable of adapting to changing detector conditions and systems over time.

## Duration

Total project length: 175 hours.

## Task ideas
* Develop ensemble learning ML models using CMS data for the electromagnetic calorimeter (ECAL) sub-system.
* Build and train the overall models and to demonstrate their performance on one single sub-system.
* Develop continuous learning models for a single detector subsystem
* Evaluate and Benchmark model performance and robustness to changing detector conditions.

## Expected results
* Build an ensemble learning ML model and demonstrate performance comparable or superior to human operator monitoring.
* Validate the development of these models by using data instead acquired using the Tracker sub-system. The purpose of this task is to demonstrate the ability to take the overall architecture of the model but to train it with data from a different sub-system and evaluate its performance.
* Build a continuous machine learning model pipeline
* Evaluate and Benchmark the models with realistic datasets

## Requirements
C++, Python, PyTorch, Tensorflow and some previous experience in Deep Learning.
C++, Python, PyTorch, Tensorflow, previous experience in Deep Learning.

## Project difficulty level
Challenging
Medium

## Mentors
* [Emanuele Usai](mailto:[email protected]) (University of Alabama)
Expand Down

0 comments on commit 9ab193d

Please sign in to comment.