Update student project description

epfl-matmat · Dec 18, 2024 · e5f0775 · e5f0775
1 parent 435af98
commit e5f0775
Showing 1 changed file with 14 additions and 50 deletions.
diff --git a/student_projects/index.md b/student_projects/index.md
@@ -72,64 +72,28 @@ Experience in solid-state physics and materials modelling is a bonus;
 
 ----
 
-## Active learning with adaptive discretisation cost
-
-Building large datasets with materials properties from density-functional
-theory (DFT) calculations
-is a challenge. Active learning techniques try to efficiently query simulators
-iteratively,
-based on a statistical model [^Garnett2023].
-The computational cost of DFT is however not uniform across materials.
-Understanding the cost for a given target accuracy is a problem of error
-control with numerical parameters.
-
-One of the core parameters determining the cost of a DFT calculation is the
-discretisation. The baseline active-learning approach is computing with a fixed
-discretisation
-(plane-wave cutoff) chosen a priori for the whole dataset (e.g. [^vanderOord]
-and [^Merchant2023]).
-
-The goal of this project is to formulate and implement an augmented active
-learning model which
-can choose not only the next material structure to query but also choose an
-appropriate discretisation adaptively,
-trading off per-example uncertainty reduction against computational cost.
+## Data-driven materials modelling with uncertainty-informed Gaussian processes
 
-**Requirements:**
-Strong programming skills, ideally Julia or python;
-Basic knowledge of numerical methods for partial differential equations;
-Experience with probabilistic machine learning methods is a bonus;
-Experience in running DFT calculations is a bonus;
+Modern materials discovery relies heavily on data-driven methods to efficiently and accurately predict material properties, 
+using datasets generated from density-functional theory (DFT) calculations. 
+However, these datasets often face two key challenges:
 
-[^Garnett2023]: R. Garnett. *Bayesian Optimization*. Cambridge University Press (2023).
+- Non-uniform computational cost: The cost of DFT calculations varies significantly across materials due to differences in numerical parameters (discretisation basis, k-point sampling, tolerances) required for a given accuracy. The baseline active learning approach is computing with a fixed discretisation (plane-wave cutoff) chosen a priori for the whole dataset (e.g. [^vanderOord] and [^Merchant2023]), which may not optimally balance cost and accuracy across diverse materials.
 
-[^vanderOord]: C. van der Oord, M. Sachs, D. P. Kovács, C. Ortner and G. Csányi . *Hyperactive learning for data-driven interatomic potentials*. npj Comput Mater 9, 168 (2023). DOI [10.1038/s41524-023-01104-6](https://doi.org/10.1038/s41524-023-01104-6)
+- Data heterogeneity: Training data often come from diverse sources, exhibiting varying levels of uncertainty, which affects the reliability of predictive models. 
 
-[^Merchant2023]: A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon and E. D. Cubuk. *Scaling deep learning for materials discovery*. Nature 624, 80–85 (2023). DOI [10.1038/s41586-023-06735-9](https://doi.org/10.1038/s41586-023-06735-9)
+The goal of this project is to develop a framework that overcomes these challenges by integrating adaptive learning with uncertainty-aware models. This involves formulating an active learning approach that adaptively selects both material structures and numerical parameters to optimize computational resources, while employing Gaussian Process regression [^RasmussenWilliams06] to effectively propagate and manage uncertainties in heterogeneous datasets. By combining these techniques, the project aims to improve the accuracy, efficiency, and reliability of data-driven materials modeling.
 
-----
-
-## Error propagation in statistical learning for data of heterogeneous quality
-
-Data-driven materials modeling has been shown to be essential in modern materials discovery. 
-Statistical models trained on large datasets of first-principle simulations provide efficient 
-and accurate predictions of materials properties, reducing the need for costly computations. 
-However, the underlying assumption of uniformly high-quality training data doesn't always meet reality. 
-When faced with data from diverse sources, incorporating the different level of uncertainty in the data 
-is necessary to ensure accurate predictions of the quantity of interest.
-
-In this project, we will use Gaussian Process (GP) regression that offers an approach to efficiently 
-handle data with varying quality, providing probabilistic predictions which enable 
-quantification of uncertainty [^RasmussenWilliams06]. We will explore the potential of error propagation 
-within GP regression with non-uniform noise model, and evaluate the accuracy of the developed model to ensure 
-its applicability for practical data-driven materials modelling.
 
 **Requirements:**
-Strong programming skills, ideally Julia or Python; 
-Experience with probabilistic machine learning methods, Gaussian Processes, Bayesian optimization; 
-Experience with DFTK is a bonus; 
-Basic knowledge of numerical methods for partial differential equations is a bonus.
+Strong programming skills, ideally Julia or python; 
+Basic knowledge of numerical methods for partial differential equations; 
+Experience with probabilistic machine learning methods is a bonus; 
+Experience in running DFT calculations is a bonus.
 
+[^vanderOord]: C. van der Oord, M. Sachs, D. P. Kovács, C. Ortner and G. Csányi . *Hyperactive learning for data-driven interatomic potentials*. npj Comput Mater 9, 168 (2023). DOI [10.1038/s41524-023-01104-6](https://doi.org/10.1038/s41524-023-01104-6)
+
+[^Merchant2023]: A. Merchant, S. Batzner, S. S. Schoenholz, M. Aykol, G. Cheon and E. D. Cubuk. *Scaling deep learning for materials discovery*. Nature 624, 80–85 (2023). DOI [10.1038/s41586-023-06735-9](https://doi.org/10.1038/s41586-023-06735-9)
 
 [^RasmussenWilliams06]: C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006. DOI [3206.001.0001](https://doi.org/10.7551/mitpress/3206.001.0001)