ml4dqm test

ML4SCI · Mar 10, 2024 · f0cdbaa · f0cdbaa
1 parent 4a7b11d
commit f0cdbaa
Show file tree

Hide file tree

Showing 3 changed files with 59 additions and 0 deletions.
diff --git a/_gsocprojects/2024/project_ML4DQM.md b/_gsocprojects/2024/project_ML4DQM.md
@@ -0,0 +1,13 @@
+---
+project: ML4DQM
+layout: default
+logo: ml4dqm.jpg
+description: |
+   
+
+   Data Quality Monitoring (DQM) is a task which often requires identifying anomalies in order to ensure that data being recorded meets certain quality criteria standards. Although some aspects of DQM have successfully been automated, it is still by and large an endeavor that requires a human operator to actively monitor input data and discern whether such data is ‘good’ or ‘bad’.
+   The Machine learning for Data Quality Monitoring (ML4DQM) project aims at exploiting the capabilities of recent advancements in ML to automate tasks currently performed by error-prone human operators.
+   The project focuses on DQM of particle collision data collected with the CMS Experiment at CERN Large Hadron Collider.
+---
+
+{% include gsoc_project.ext %}
diff --git a/_gsocproposals/2024/proposal_ML4DQM1.md b/_gsocproposals/2024/proposal_ML4DQM1.md
@@ -0,0 +1,46 @@
+---
+title:  Continuous learning for data quality monitoring at particle colliders
+layout: gsoc_proposal
+project: ML4DQM
+year: 2024
+organization:
+  - Alabama
+
+---
+
+## Description
+
+
+One key challenge in models currently used in Machine Learning do Data Quality Monitoring is that such models are often limited in their transferability to other systems, meaning each system or sub-detector within large HEP detectors need to have individual models developed, tested, and deployed, which can take significant time and effort. Furthermore, as detectors age, the data they produce can exhibit expected variations, potentially leading to misclassification as ‘bad data’ when ML models performing DQM have been trained on pristine detector data.
+This proposal seeks to address these challenges by pioneering continuous learning ML models that leverage ensemble learning techniques that are collectively able to adapt to both changing detector conditions, as well as changing detector systems.
+
+
+## Duration
+
+Total project length: 175 hours.
+
+## Task ideas
+ * Develop ensemble learning ML models using CMS data for the electromagnetic calorimeter (ECAL) sub-system.
+ * Build and train the overall models and to demonstrate their performance on one single sub-system.
+
+## Expected results
+ * Build an ensemble learning ML model and demonstrate performance comparable or superior to human operator monitoring.
+ * Validate the development of these models by using data instead acquired using the Tracker sub-system. The purpose of this task is to demonstrate the ability to take the overall architecture of the model but to train it with data from a different sub-system and evaluate its performance.
+
+## Requirements
+C++, Python, PyTorch, Tensorflow and some previous experience in Deep Learning.
+
+## Project difficulty level
+Challenging
+
+## Mentors
+  * [Emanuele Usai](mailto:[email protected]) (University of Alabama)
+  * [Sergei Gleyzer](mailto:[email protected]) (University of Alabama)
+
+
+## Test
+Solve the evaluation task(s) for any of the other projects in the ML4SCI umbrella organization.  Please send us your CV and a link to all your completed work (github repo, Jupyter notebook + pdf of Jupyter notebook with output) to [[email protected]](mailto:[email protected]) with Evaluation Test: ML4DQM in the title. In the email specify which evaluation test(s) you solved. 
+
+Please **DO NOT** contact mentors directly by email. Instead, please email [[email protected]](mailto:[email protected]) with Project Title and **include your CV** and **test results**. The mentors will then get in touch with you.
+
+
diff --git a/images/ml4dqm.jpg b/images/ml4dqm.jpg