- 13.1 Trust Region Policy Optimization
- 13.2. Math Essentials
- 13.2.1. Taylor series
- 13.2.2. Trust Region method
- 13.2.3. Conjugate Gradient Method
- 13.2.4. Lagrange Multiplier
- 13.2.5. Importance Sampling
- 13.3. Designing the TRPO Objective Function
- 13.3.1. Parameterizing the Policy
- 13.3.2. Sample Based Estimation
- 13.4. Solving the TRPO Objective Function
- 13.4.1. Computing the Search Direction
- 13.4.2. Perform Line Search in the Search Direction
- 13.5. Algorithm - TRPO
- 13.6. Proximal Policy Optimization
- 13.7. PPO with Clipped Objective
- 13.8. Algorithm - PPO-Clipped
- 13.9. Implementing PPO-Clipped Method
- 13.10. PPO with Penalized Objective
- 13.10.1. Algorithm - PPO-Penalty
- 13.11. Actor Critic using Kronecker Factored Trust Region
- 13.12. Math Essentials
- 13.12.1. Block Matrix
- 13.12.2. Block Diagonal Matrix
- 13.12.3. Kronecker Product
- 13.12.4. Vec Operator
- 13.12.5. Properties of Kronecker Product
- 13.13. Kronecker-Factored Approximate Curvature (K-FAC)
- 13.14. K-FAC in Actor Critic
- 13.14.1 Incorporating Trust Region
13. TRPO, PPO and ACKTR Methods
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||