Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Algorithm] CrossQ #2033

Merged
merged 49 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
0a23ae8
add crossQ examples
BY571 Mar 20, 2024
9bdee71
add loss
BY571 Mar 20, 2024
570a20e
Update naming experiment
BY571 Mar 21, 2024
5086249
update
BY571 Mar 21, 2024
c3a927f
update add tests
BY571 Mar 21, 2024
d1c9c34
detach
BY571 Mar 21, 2024
e879b7c
update tests
BY571 Mar 21, 2024
75255e7
update run_test.sh
BY571 Mar 21, 2024
a7b79c3
move crossq to sota-implementations
BY571 Mar 21, 2024
be84f3f
update loss
BY571 Mar 26, 2024
2170ad8
update cat prediction
BY571 Mar 26, 2024
75d4cee
Merge branch 'main' into crossQ
vmoens Jun 12, 2024
7711a4e
Merge branch 'main' into crossQ
BY571 Jun 26, 2024
f0ac167
add batchrenorm to crossq
BY571 Jun 26, 2024
37abb14
Merge branch 'crossQ' of github.com:BY571/rl into crossQ
BY571 Jun 26, 2024
bc7675a
small fixes
BY571 Jun 26, 2024
9543f2e
update docs and sota checks
BY571 Jun 26, 2024
53e35f7
hyperparam fix
BY571 Jun 26, 2024
172e1c0
test
BY571 Jun 27, 2024
fdb7e8b
update batch norm tests
BY571 Jun 27, 2024
5501d43
tests
BY571 Jul 3, 2024
c47ac84
cleanup
BY571 Jul 5, 2024
e718c3f
Merge branch 'main' into crossQ
BY571 Jul 5, 2024
f94165e
update
BY571 Jul 7, 2024
02c94ff
update lr param
BY571 Jul 8, 2024
93b6a7b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
4b914e6
Apply suggestions from code review
vmoens Jul 8, 2024
af8c64a
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
845c8a9
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
7b4a69d
set qnet eval in actor loss
BY571 Jul 8, 2024
77de044
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
BY571 Jul 8, 2024
35c7a98
take off comment
BY571 Jul 8, 2024
68a1a9f
amend
vmoens Jul 8, 2024
c04eb3b
Merge branch 'crossQ' of https://github.com/BY571/rl into crossQ
vmoens Jul 8, 2024
12672ee
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 8, 2024
7fbb27d
amend
vmoens Jul 8, 2024
ff80481
amend
vmoens Jul 8, 2024
caf702e
amend
vmoens Jul 8, 2024
70e2882
amend
vmoens Jul 8, 2024
ccd1b7f
amend
vmoens Jul 8, 2024
d3c8b0e
Merge remote-tracking branch 'origin/main' into crossQ
vmoens Jul 9, 2024
d3e0bb1
Apply suggestions from code review
vmoens Jul 9, 2024
349cb28
amend
vmoens Jul 9, 2024
75a43e7
amend
vmoens Jul 9, 2024
abada6c
fix device error
BY571 Jul 9, 2024
c878b81
Update objective delay actor
BY571 Jul 9, 2024
f222b11
Update tests not expecting target update
BY571 Jul 9, 2024
067b560
update example utils
BY571 Jul 9, 2024
c010e39
amend
vmoens Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .github/unittest/linux_examples/scripts/run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,18 @@ python .github/unittest/helpers/coverage_run_parallel.py sota-implementations/di
replay_buffer.size=120 \
env.name=CartPole-v1 \
logger.backend=
python .github/unittest/helpers/coverage_run_parallel.py sota-implementations/crossq/crossq.py \
collector.total_frames=48 \
collector.init_random_frames=10 \
collector.frames_per_batch=16 \
collector.env_per_collector=2 \
collector.device= \
optim.batch_size=10 \
optim.utd_ratio=1 \
replay_buffer.size=120 \
env.name=Pendulum-v1 \
network.device= \
logger.backend=
python .github/unittest/helpers/coverage_run_parallel.py sota-implementations/dreamer/dreamer.py \
collector.total_frames=200 \
collector.init_random_frames=10 \
Expand Down
9 changes: 9 additions & 0 deletions docs/source/reference/objectives.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,15 @@ REDQ

REDQLoss

CrossQ
----

.. autosummary::
:toctree: generated/
:template: rl_template_noinherit.rst

CrossQ

IQL
----

Expand Down
26 changes: 26 additions & 0 deletions sota-check/run_crossq.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=crossq
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/crossq_%j.txt
#SBATCH --error=slurm_errors/crossq_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="crossq"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/sota-implementations/crossq/crossq.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
96 changes: 96 additions & 0 deletions sota-implementations/crossq/batchrenorm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import torch
vmoens marked this conversation as resolved.
Show resolved Hide resolved
import torch.nn as nn


class BatchRenorm(nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this in the modules no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add it to the doc.
Happy to write a couple of tests.
Is it a copy paste? If so, can we check the license?

"""
BatchRenorm Module (https://arxiv.org/abs/1702.03275).

BatchRenorm is an enhanced version of the standard BatchNorm. Unlike BatchNorm,
BatchRenorm utilizes running statistics to normalize batches after an initial warmup phase.
This approach reduces the impact of "outlier" batches that may occur during extended training periods,
making BatchRenorm more robust for long training runs.

During the warmup phase, BatchRenorm functions identically to a BatchNorm layer.

Args:
num_features (int): Number of features in the input tensor.
eps (float, optional): Small value added to the variance to avoid division by zero. Default is 1e-5.
momentum (float, optional): Momentum factor for computing the running mean and variance. Default is 0.01.
r_max (float, optional): Maximum value for the scaling factor r. Default is 3.0.
d_max (float, optional): Maximum value for the bias factor d. Default is 5.0.
warmup_steps (int, optional): Number of warm-up steps for the running mean and variance. Default is 5000.
BY571 marked this conversation as resolved.
Show resolved Hide resolved
"""

def __init__(
self,
num_features,
eps=0.01,
momentum=0.99,
r_max=3.0,
d_max=5.0,
warmup_steps=100000,
):

super(BatchRenorm, self).__init__()
vmoens marked this conversation as resolved.
Show resolved Hide resolved
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.r_max = r_max
self.d_max = d_max
self.warmup_steps = warmup_steps
self.step_count = 0

self.gamma = nn.Parameter(torch.ones(num_features))
self.beta = nn.Parameter(torch.zeros(num_features))

self.register_buffer("running_mean", torch.zeros(num_features))
self.register_buffer("running_var", torch.ones(num_features))

def forward(self, x):
self.step_count += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a buffer such that loading it from a checkpoint restores its value


# Compute the dimensions for mean and variance calculation
dims = [i for i in range(x.dim()) if i != 1]
expand_dims = [1 if i != 1 else -1 for i in range(x.dim())]

# Compute batch statistics
batch_mean = x.mean(dims, keepdim=True)
batch_var = x.var(dims, unbiased=False, keepdim=True)

if self.training:
if self.step_count <= self.warmup_steps:
# Use classical BatchNorm during warmup
x_hat = (x - batch_mean) / torch.sqrt(batch_var + self.eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about torch.nn.functional.batch_norm?

else:
# Use Batch Renormalization
with torch.no_grad():
r = torch.clamp(
batch_var / self.running_var.view(*expand_dims),
1.0 / self.r_max,
self.r_max,
)
d = torch.clamp(
(batch_mean - self.running_mean.view(*expand_dims))
/ torch.sqrt(self.running_var.view(*expand_dims) + self.eps),
-self.d_max,
self.d_max,
)

x_hat = (x - batch_mean) / torch.sqrt(batch_var + self.eps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use torch.nn.functional.batch_norm, and move this out of the block (since it's the same line as 64)

x_hat = x_hat * r + d

# Update running statistics
self.running_mean.mul_(1 - self.momentum).add_(
batch_mean.squeeze().detach() * self.momentum
)
self.running_var.mul_(1 - self.momentum).add_(
batch_var.squeeze().detach() * self.momentum
)
else:
# Use running statistics during inference
x_hat = (x - self.running_mean.view(*expand_dims)) / torch.sqrt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.nn.functional.batch_norm has a training param

self.running_var.view(*expand_dims) + self.eps
)

return self.gamma.view(*expand_dims) * x_hat + self.beta.view(*expand_dims)
58 changes: 58 additions & 0 deletions sota-implementations/crossq/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# environment and task
env:
name: HalfCheetah-v4
task: ""
library: gym
max_episode_steps: 1000
seed: 42

# collector
collector:
total_frames: 1_000_000
init_random_frames: 25000
frames_per_batch: 1000
init_env_steps: 1000
device: cpu
env_per_collector: 1
reset_at_each_iter: False

# replay buffer
replay_buffer:
size: 1000000
prb: 0 # use prioritized experience replay
scratch_dir: null

# optim
optim:
utd_ratio: 1.0
policy_update_delay: 3
gamma: 0.99
loss_function: l2
lr: 1.0e-3
weight_decay: 0.0
batch_size: 256
alpha_init: 1.0
adam_eps: 1.0e-8
beta1: 0.5
beta2: 0.999

# network
network:
batch_norm_momentum: 0.99
warmup_steps: 100000
critic_hidden_sizes: [2048, 2048]
actor_hidden_sizes: [256, 256]
critic_activation: relu
actor_activation: relu
default_policy_scale: 1.0
scale_lb: 0.1
device: "cuda:0"

# logging
logger:
backend: wandb
project_name: torchrl_example_crossQ
group_name: null
exp_name: ${env.name}_CrossQ
mode: online
eval_iter: 25000
Loading
Loading