Skip to content

Latest commit

 

History

History
457 lines (309 loc) · 16.8 KB

slides.md

File metadata and controls

457 lines (309 loc) · 16.8 KB
theme highlighter class title background layout
seriph
shiki
text-center
Diffusion Model for Control and Planning Tutorial
figs/diffuse_teaser.gif
cover

📚 Tutorial: Diffusion Model for Control and Planning


transition: fade class: text-left

📖 Tutorial Outline

  1. 🔄 Recap: What is a Diffusion Model?
  2. 🚀 Motivation: Why a Generative Model in Control and Planning?
  3. 🛠️ Practice: How to Use the Diffuser?
  4. 📚 Literatures: Recent Research Progress in Diffusion for RL/Control
  5. 📝 Summary & Challenges in Diffusion Models

🔄 Recap: What is a Diffusion Model?

  • Keynote: Generative model for distribution matching.
  • Applications: Image and text generation, creative tasks. diffusion examples

🔄 Recap: What is a Diffusion Model?

  • Keynote: Generative model for distribution matching.
  • Applications: Image and text generation, creative tasks.
  • Core: Score function for sample generation and distribution description. $$ \boldsymbol{x}_{i+1} \leftarrow \boldsymbol{x}_i+c \nabla \log p\left(\boldsymbol{x}_i\right)+\sqrt{2 c} \boldsymbol{\epsilon}, \quad i=0,1, \ldots, K $$ Annealed Langevin dynamics combine a sequence of Langevin chains with gradually decreasing noise scales.

🔄 Recap: What is a Diffusion Model?

  • Keynote: Generative model for distribution matching.

  • Applications: Image and text generation, creative tasks.

  • Core: Score function for sample generation and distribution description.

  • Advantages:

    • 🌟 Multimodal: Effective with multimodal distributions.
    • 📈 Scalable: Suits high-dimensional problems.
    • 🔒 Stable: Grounded in solid mathematics and training.
    • 🔄 Non-autoregressive: Predicts entire trajectories efficiently.

🚀 Motivation: Why a Generative Model in Control and Planning?

Generative Models in Control and Planning

  • Generative Models: application in imitation learning to match expert data.
  • Examples: GANs, VAEs in imitation learning. Other generative model overview

🚀 Motivation: Why a Generative Model in Control and Planning?

Generative Models in Control and Planning

  • Generative Models: application in imitation learning to match expert data.
  • Examples: GANs, VAEs in imitation learning.
  • GAN in GAIL: Discriminator learning and policy training.
    • Idea: Train a discriminator to distinguish between expert and agent data.
    • Limitation: Struggles with multimodal distributions, unstable training.

GAIL algorithm


🚀 Motivation: Why a Generative Model in Control and Planning?

Generative Models in Control and Planning

  • Generative Models: Crucial in control and planning.
  • Examples: GANs, VAEs in imitation learning.
  • VAE in ACT (ALOHA): Latent space learning for planning.
    • Idea: learn a latent space for planning and control. (generate action in chunks)
    • Limitation: hard to train.

ACT algorithm


layout: iframe


🚀 Motivation: Why a Generative Model in Control and Planning?

What to Learn with the Generative Model?

Scenario: Imitation Learning $\min |p_\theta(\tau) - p_{\text{data}}(\tau)|$

  • Challenge: Match high-dimensional, multimodal trajectory distributions.
  • Solution: Diffusion models for expressive distribution matching.
  • Common Method: GAIL with adversarial training.
  • Limitation: Struggles with multimodal distributions, unstable training.

Imitation Learning Limitation


🚀 Motivation: Why a Generative Model in Control and Planning?

What to Learn with the Generative Model?

Scenario: Offline Reinforcement Learning $\max J_\theta(\tau) \ge J_\text{data}(\tau) \text{s.t.} |p_\theta(\tau) - p_{\text{data}}(\tau)| < \epsilon$

  • Challenge: Outperform demonstrations, ensure close action distribution.
  • Solution: Diffusion models to match action distribution effectively.
  • Common Method: CQL, penalizes out-of-distribution samples.
  • Limitation: over-conservative.

Offline RL Limitation


🚀 Motivation: Why a Generative Model in Control and Planning?

What to Learn with the Generative Model?

Scenario: Model-based Reinforcement Learning

  • Challenge: Match dynamic model and policy's action distribution.
  • Solution: Diffusion models for non-autoregressive, multimodal matching.
  • Common method: planning with learned dynamics.
  • Limitation: compounding error in long-horizon planning.

Model-based RL Limitation


🚀 Motivation: Why a Generative Model in Control and Planning?

What to Learn with the Generative Model?

Key: using a powerful model to matching a high-dimensional, multimodal distribution.

  • Action/Value distribution matching: grounded in demostrations -> offline RL.
  • Trajectory distribution matching: dynamic feasibility and optimal trajectory distribution -> model-based RL.
  • Transition distribution matching: dynamics matching in a non-autoregressive manner -> model-based RL.

🛠️ Practice: How to Use the Diffuser?

What to Diffuse?

  • Most common: diffuse trajectory (diffuser).
  • Diffused variable x: state, action sequence. $\tau = {s_0, a_0, s_1, a_1, \ldots, s_T, a_T}$.
Task Thing's to Diffuse How to Diffuse
Image Generation Noise Image Image Diffuse
Planning Diffusion Model Diffuse Future State-Action Sequence Diffuse Process for Next Trajectory

layout: iframe


🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Objective: make the trained model can generalize to new constraints and tasks.

  • Common case: goal-conditioned, safety, new task etc.

  • Possible Methods:

    • Guidance function (d): shift distribution with extra gradient.
    • Classifier-free method: learn a model can both represent conditional and unconditional distribution.
    • Inpainting (a): fill in the missing part of the trajectory by fixing certain start and end state.

Guidance Function


🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Guidance function: shift distribution with extra gradient.
    • Predefined the guidance function:
      • Method: shift distribution with a manually defined function
      • Limitation: Might lead to OOD samples, which break the learned diffusion process.

$$ \tilde{p}\theta(\boldsymbol{\tau}) \propto p\theta(\boldsymbol{\tau}) h(\boldsymbol{\tau}) \ \boldsymbol{\tau}^{i-1} = \mathcal{N}\left(\mu+\alpha \Sigma \nabla \mathcal{J}(\mu), \Sigma^i\right) $$

Guidance Function


🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Guidance function: shift distribution with extra gradient. ▶️ leads to OOD samples
    • Predefined the guidance function:
      • Method: shift distribution with a manually defined function
      • Limitation: Might lead to OOD samples, which break the learned diffusion process.
    • Learned classifier:
      • Method: learning a classifier to distinguish between different constraints. (similar to GAN)
      • Limitation: Hard to tune parameters.

$$ \begin{aligned} \nabla \log p\left(\boldsymbol{x}_t \mid y\right) & =\nabla \log \left(\frac{p\left(\boldsymbol{x}_t\right) p\left(y \mid \boldsymbol{x}_t\right)}{p(y)}\right) \ & =\nabla \log p\left(\boldsymbol{x}_t\right)+\nabla \log p\left(y \mid \boldsymbol{x}_t\right)-\nabla \log p(y) \ & =\underbrace{\nabla \log p\left(\boldsymbol{x}t\right)}{\text {unconditional score }}+\underbrace{\nabla \log p\left(y \mid \boldsymbol{x}t\right)}{\text {adversarial gradient }} \end{aligned} $$


🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Guidance function: shift distribution with extra gradient. ▶️ leads to OOD samples
  • Classifier-Free Method: learn a model can both represent conditional and unconditional distribution.
    • Method: drop out the condition term to learn a model can represent both conditional and unconditional distribution.

$$ \begin{aligned} \nabla \log p\left(\boldsymbol{x}_t \mid y\right) & =\nabla \log p\left(\boldsymbol{x}_t\right)+\gamma\left(\nabla \log p\left(\boldsymbol{x}_t \mid y\right)-\nabla \log p\left(\boldsymbol{x}_t\right)\right) \ & =\nabla \log p\left(\boldsymbol{x}_t\right)+\gamma \nabla \log p\left(\boldsymbol{x}_t \mid y\right)-\gamma \nabla \log p\left(\boldsymbol{x}_t\right) \ & =\underbrace{\gamma \nabla \log p\left(\boldsymbol{x}t \mid y\right)}{\text {conditional score }}+\underbrace{(1-\gamma) \nabla \log p\left(\boldsymbol{x}t\right)}{\text {unconditional score }} \end{aligned} $$


🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Guidance function: shift distribution with extra gradient. ▶️ leads to OOD samples
  • Classifier-Free Method: learn a model can both represent conditional and unconditional distribution.
Guidance Function Method Classifier-Free Method
Guidance Function Classifier-Free Guidance

🛠️ Practice: How to Use the Diffuser?

How to Impose Constraints/Objectives?

  • Guidance function: shift distribution with extra gradient. ▶️ leads to OOD samples
  • Classifier-Free Method: learn a model can both represent conditional and unconditional distribution.
  • Inpainting: fill in the missing part of the trajectory by fixing certain start and end state.
    • Method: fix the start and end state, and fill in the missing part of the trajectory.

Inpainting


🛠️ Practice: How to Use the Diffuser?

  • Common thing to diffuse: trajectory.
  • Common way to impose constraints/add objectives: guidance function, classifier-free method, inpainting.

📚 Literatures: Recent Research Progress in Diffusion for RL/Control

A detailed summary of each method can be found here.

The key of diffusion: how to get the score function.

$$ \color{red}\underbrace{\nabla_x \log P}{\text{how to get score function}} \color{black}( \color{blue}\underbrace{x}{\text{what to diffuse}} \color{black}| \color{green}\underbrace{y}_{\text{how to impose constraints/objectives}} \color{black}) $$

  • How to get score function: data-driven v.s. analytical.
  • What to diffuse: sequential v.s. non-sequential.
  • How to impose constraints/objectives: hard v.s. soft.

📚 Literatures: Recent Research Progress in Diffusion for RL/Control

$$ \color{red}\underbrace{\nabla_x \log P}{\text{how to get score function}} \color{black}( \color{blue}\underbrace{x}{\text{what to diffuse}} \color{black}| \color{green}\underbrace{y}_{\text{how to impose constraints/objectives}} \color{black}) $$

  • How to get score function: data-driven v.s. analytical.
    • Data-driven: learn the score function from data.
    • Hybrid: learning from optimization intermediate results.
    • Analytical: use the analytical score function.

Different ways to get the score function


📚 Literatures: Recent Research Progress in Diffusion for RL/Control

$$ \color{red}\underbrace{\nabla_x \log P}{\text{how to get score function}} \color{black}( \color{blue}\underbrace{x}{\text{what to diffuse}} \color{black}| \color{green}\underbrace{y}_{\text{how to impose constraints/objectives}} \color{black}) $$

  • How to get score function: data-driven v.s. analytical.
  • What to diffuse: sequential v.s. non-sequential.
    • Action/Value: learn a model to match action/value distribution, serve as regularizer and policy.
    • Transition: learn a model to match transition distribution, serve as a world mode. ▶️ MPC
    • Trajectory: learn a model to match trajectory distribution, serve as a TO solver. (planning state v.s. state-action v.s. action)

Different things to diffuse


📚 Literatures: Recent Research Progress in Diffusion for RL/Control

$$ \color{red}\underbrace{\nabla_x \log P}{\text{how to get score function}} \color{black}( \color{blue}\underbrace{x}{\text{what to diffuse}} \color{black}| \color{green}\underbrace{y}_{\text{how to impose constraints/objectives}} \color{black}) $$

  • How to get score function: data-driven v.s. analytical.
  • What to diffuse: sequential v.s. non-sequential.
  • How to impose constraints/objectives: hard v.s. soft.
    • Guidance function: Predefined or learned
    • Classifier-free: Use the unconditional score and conditional score (most common)
    • Inpainting: Fix the state and fill in the missing parts of the distribution (complimentary to the other two)

📚 Literatures: Recent Research Progress in Diffusion for RL/Control

$$ \color{red}\underbrace{\nabla_x \log P}{\text{how to get score function}} \color{black}( \color{blue}\underbrace{x}{\text{what to diffuse}} \color{black}| \color{green}\underbrace{y}_{\text{how to impose constraints/objectives}} \color{black}) $$

  • How to get score function: data-driven v.s. analytical.
  • What to diffuse: sequential v.s. non-sequential.
  • How to impose constraints/objectives: hard v.s. soft.

safe diffuser solving QP each step to add hard constraints


📝 Summary & Challenges in Diffusion Models

  • Diffusion in robotics: matches demostration distribution from data.
  • Use cases: imitation learning, offline RL, model-based RL.
  • Role: Learns policy, trajectory, or model as a regularizer/world model/planner.

📝 Summary & Challenges in Diffusion Models

  • Diffusion in robotics: matches dataset distribution in control and planning.
  • Use cases: imitation learning, offline RL, model-based RL.
  • Role: Learns policy, planner, or model as a distribution matching problem.
  • Advantages: high-dimensional matching, stability, scalability. Compare distribution with other models

📝 Summary & Challenges in Diffusion Models

  • Diffusion in robotics: matches dataset distribution in control and planning.

  • Use cases: imitation learning, offline RL, model-based RL.

  • Role: Learns policy, planner, or model as a distribution matching problem.

  • Challenges:

    • 🕒 Computational cost: longer training and inference time.
    • 🔀 Shifting distribution: difficulties in adapting to dynamic datasets.
    • 📊 High variance: inconsistent performance in precision tasks.
    • ⛔ Constraint satisfaction: limited adaptability to new constraints.

class: text-center

🎉 Thank you!

Useful Resources

Paper list with labels

Diffusion for RL survey paper

Diffusion for RL repo

Awesome Diffusion RL repo