Skip to content

Latest commit

 

History

History
117 lines (97 loc) · 3.61 KB

rough_outline.md

File metadata and controls

117 lines (97 loc) · 3.61 KB

Rough Outline

Linear regression and (stochastic) gradient descent

  • The classic methods: high accuracy solutions, slowly
    • Gaussian elimination, LU, ...
  • Modern iterative methods: modest accuracy solutions, quickly
    • Gradient descent: minimize linear expansion
    • Stochastic gradient methods: sample and minimize linear expansion
  • Issues: stepsizes, batches, interpolation….

Optimization formulations in data science, machine learning, and sequential decision making

Statistical estimation and inverse problems in data science

  • The set-up: inverting a nonlinear mapping
  • Compressive sensing
  • Phase retrieval

Prediction problems

  • Models, loss functions, regularization
  • Minimizing prediction error
  • Maximizing the likelihood of data
  • Leveraging prior information
  • Population vs empirical problems

Sequential decision making problems

  • LQR
  • Bandits
  • Reinforcement learning

Considerations

  • Convexity, smoothness, stochasticity

Classical convex formulations

How to think about calculus

  • Gradients, Hessians, and Taylor's theorem
  • Gradient: direction of steepest descent
  • First and second order optimality conditions

Linear algebra needed

  • Linear transformations/inner products
  • Norms
  • Jacobians and the chain rule
  • Formal gradients of nondifferentiable functions

You will never differentiate by hand again

Formal introduction to auto differentiation

(Stochastic) gradient descent

  • Gradient descent
  • Stochastic gradient descent
  • Optimizing the population risk

What to expect from theory

  • Convexity: finds global minima, has rates, can be accelerated
  • Nonconvexity: finds critical points, has rates, can be accelerated in certain cases
  • In limited cases e.g., NTK, we can minimize the training loss

Modifications

Issues that affect dynamics

  • Stepsize schedule
    • Warm up, decay, cosine…
  • Ill conditioning
  • Interpolation

Online gradient descent and regret

Beyond gradients: addressing ill conditioning

Practical considerations: solving the linear system

Tentative topics beyond the basics

  • Zeroth order methods
  • Mirror Descent
  • Constraints and regularization
  • Proximal and projected gradient methods
  • Examples:
    • Compressive sensing, low-rank matrix completion
  • Alternating minimization style methods
    • Examples
      • Layer wise training of deep networks
      • Sparse coding/dictionary learning
  • Optimization over low-rank matrices and tensors
    • Burer Monteiro and Gauss-newton
  • Distributed data: federated learning
  • Sensitive data: differentially private gradient methods
    • Basic principle: noise injection
  • Curriculum learning
  • Low precision optimizers

Tuning deep learning performance

Some empirical models

  • Sampling text via transformers and MinGPT
  • Sampling images with diffusion models