Skip to content

BASE/IQL_S3 v20221029

Cryolite edited this page Oct 29, 2022 · 1 revision

Model

Value Network V(s)

Encoder

  • Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
    • Dimension: 768
    • # of heads: 12
    • Dimension of feedforward networks: 3072
    • # of layers: 12
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained encoder of BASE/IQL_S3 v20221003

Decoder

  • Type: Single-layer position-wise feedforward network
    • Dimension: 3072
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained decoder of BASE/IQL_S3 v20221003

Q Network Q(s, a)

Encoder

  • Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
    • Dimension: 768
    • # of heads: 12
    • Dimension of feedforward networks: 3072
    • # of layers: 12
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained encoder of BASE/IQL_S3 v20221003

Decoder

  • Type: Dueling network with two single-layer position-wise feedforward networks
    • Dimension: 3072
    • Activation function: GELU
    • Dropout rate in training: 0.1
    • Initialization: Transferred from the trained decoder of BASE/IQL_S3 v20221003

Objective

Data

Crawled Game Records

Crawled Game Records v202007_202109

Training Examples

100000000 samples randomly sampled from the crawled game records and shuffled.

Optimization

Implicit Q-learning (IQL)

  • Discount factor (γ): 1.0
  • Expectile (τ): 0.90
  • Soft update (Polyak averaging) rate of target networks (α): 0.1
  • Optimizer: LAMB
  • Learning rate: 0.001
  • ε: 1.0e-6
  • Batch size: 131072
  • # of training epochs: N/A

Graphs of Loss Functions and Their Gradient Norms

Graph of Gradient Norm of Q Losses

Q Gradient Norm

Graph of Q Losses

Q Loss

Graph of Gradient Norm of V Loss

Value Gradient Norm

Graph of V Loss

Value Loss

Advantage Weighted Regression (AWR)

(TODO)

Quantitative Comparison with BASE/BC_H13 v20220210 as the Baseline

(TODO)