-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.compile
agents
#261
Comments
I've run an experiment on a Lightning Studio with an A10G GPU with the following command: python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.devices=1 fabric.precision=32 fabric.accelerator=gpu It has run in less than 6 hours and I've obtained the following results: The test on different seeds are:
with a This is the graph reported by Hafner et al. |
I have run a walker walk training with the commit: d640a41 (ReLU as activation function). python sheeprl.py exp=dreamer_v3_dmc_walker_walk algo.world_model.decoupled_rssm=True Below the obtained results are reported: My env is: $ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.3.0.dev20240314+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Clang version: Could not collect
CMake version: version 3.27.7
Libc version: glibc-2.35
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 535.54.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 10
CPU max MHz: 2400,0000
CPU min MHz: 800,0000
BogoMIPS: 4800.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 1,5 MiB (6 instances)
L3 cache: 12 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Mitigation; TSX disabled
Versions of relevant libraries:
[pip3] mypy==1.2.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.3
[pip3] pytorch-lightning==2.1.3
[pip3] pytorch-triton==3.0.0+989adb9a29
[pip3] torch==2.3.0.dev20240314+cu121
[pip3] torch-tb-profiler==0.4.3
[pip3] torchmetrics==1.3.0
[pip3] torchvision==0.18.0.dev20240314+cu121
[pip3] triton==2.2.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] pytorch-lightning 2.1.3 pypi_0 pypi
[conda] pytorch-triton 3.0.0+989adb9a29 pypi_0 pypi
[conda] torch 2.3.0.dev20240314+cu121 pypi_0 pypi
[conda] torch-tb-profiler 0.4.3 pypi_0 pypi
[conda] torchmetrics 1.3.0 pypi_0 pypi
[conda] torchvision 0.18.0.dev20240314+cu121 pypi_0 pypi
[conda] triton 2.2.0 pypi_0 pypi |
I've run an experiment on 4 A10G on a Lightning Studio with the following command: python sheeprl.py exp=dreamer_v3_100k_ms_pacman fabric.devices=4 fabric.precision=32 fabric.accelerator=gpu where I've manually changed the following:
It has run in less than 4 hours and those are the results: The test on different seeds are:
with a 1311.67 +- 259.77 average reward. |
I have run a walker walk training with the commit: fab9f48 (SiLU as activation function). python sheeprl.py exp=dreamer_v3_dmc_walker_walk The green line is the compiled experiment described above (with ReLU, decoupled RSSM, and the d640a41 commit). The grey line is the new experiment. My env is: $ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.3.0.dev20240314+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Clang version: Could not collect
CMake version: version 3.27.7
Libc version: glibc-2.35
Python version: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 535.54.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-8700T CPU @ 2.40GHz
CPU family: 6
Model: 158
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 10
CPU max MHz: 2400,0000
CPU min MHz: 800,0000
BogoMIPS: 4800.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Virtualization: VT-x
L1d cache: 192 KiB (6 instances)
L1i cache: 192 KiB (6 instances)
L2 cache: 1,5 MiB (6 instances)
L3 cache: 12 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Mitigation; TSX disabled
Versions of relevant libraries:
[pip3] mypy==1.2.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.3
[pip3] pytorch-lightning==2.1.3
[pip3] pytorch-triton==3.0.0+989adb9a29
[pip3] torch==2.3.0.dev20240314+cu121
[pip3] torch-tb-profiler==0.4.3
[pip3] torchmetrics==1.3.0
[pip3] torchvision==0.18.0.dev20240314+cu121
[pip3] triton==2.2.0
[conda] numpy 1.26.3 pypi_0 pypi
[conda] pytorch-lightning 2.1.3 pypi_0 pypi
[conda] pytorch-triton 3.0.0+989adb9a29 pypi_0 pypi
[conda] torch 2.3.0.dev20240314+cu121 pypi_0 pypi
[conda] torch-tb-profiler 0.4.3 pypi_0 pypi
[conda] torchmetrics 1.3.0 pypi_0 pypi
[conda] torchvision 0.18.0.dev20240314+cu121 pypi_0 pypi
[conda] triton 2.2.0 pypi_0 pypi |
Hi, Thanks for this. Was testing this out, Error executing job with overrides: ['exp=dreamer_v3', 'env=gym', 'env.id=CartPole-v1', 'env.num_envs=4', 'fabric.accelerator=gpu', 'fabric.precision=32-true', 'algo=dreamer_v3_S', 'algo.learning_starts=1024', 'algo.cnn_keys.encoder=[]', 'algo.mlp_keys.encoder=[vector]', 'algo.cnn_keys.decoder=[]', 'algo.mlp_keys.decoder=[vector]', 'algo.per_rank_sequence_length=64', 'algo.replay_ratio=0.5', 'algo.world_model.decoupled_rssm=False', 'algo.world_model.learnable_initial_recurrent_state=False']
Traceback (most recent call last):
File "/home/sam/dev/sheeprl/sheeprl/cli.py", line 352, in run
run_algorithm(cfg)
File "/home/sam/dev/sheeprl/sheeprl/cli.py", line 190, in run_algorithm
fabric.launch(reproducible(command), cfg, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 859, in launch
return self._wrap_and_launch(function, self, *args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 945, in _wrap_and_launch
return to_run(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 950, in _wrap_with_setup
return to_run(*args, **kwargs)
File "/home/sam/dev/sheeprl/sheeprl/cli.py", line 186, in wrapper
return func(fabric, cfg, *args, **kwargs)
File "/home/sam/dev/sheeprl/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 758, in main
train(
File "/home/sam/dev/sheeprl/sheeprl/algos/dreamer_v3/dreamer_v3.py", line 341, in train
policies: Sequence[Distribution] = actor(imagined_trajectories.detach())[1]
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 139, in forward
output = self._forward_module(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 387, in _fn
return fn(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 977, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert
return _compile(
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_utils_internal.py", line 70, in wrapper_function
return function(*args, **kwargs)
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 700, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 266, in time_wrapper
r = func(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 568, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1116, in transform_code_object
transformations(instructions, code_options)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 173, in _fn
return fn(*args, **kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 515, in transform
tracer.run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2237, in run
super().run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
while self.step():
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
return inner_fn(self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 730, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 736, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2418, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2534, in inline_call_
tracer.run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
while self.step():
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
return inner_fn(self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 730, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 736, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2418, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2534, in inline_call_
tracer.run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
while self.step():
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
return inner_fn(self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 730, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 736, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2418, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2534, in inline_call_
tracer.run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
while self.step():
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
return inner_fn(self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 730, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 339, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 293, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(self, [*self.self_args(), *args], kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 736, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2418, in inline_call
return cls.inline_call_(parent, func, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2534, in inline_call_
tracer.run()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
while self.step():
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 492, in wrapper
return inner_fn(self, inst)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 730, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/user_defined.py", line 440, in call_function
return super().call_function(tx, args, kwargs)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/variables/base.py", line 294, in call_function
unimplemented(f"call_function {self} {args} {kwargs}")
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 212, in unimplemented
raise Unsupported(msg)
torch._dynamo.exc.Unsupported: call_function UserDefinedClassVariable(<class 'torch.Size'>) [SizeVariable()] {}
from user code:
File "/home/sam/dev/sheeprl/sheeprl/algos/dreamer_v3/agent.py", line 838, in forward
actions[i] = actions_dist[i].rsample()
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py", line 127, in rsample
samples = self.sample(sample_shape)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py", line 95, in sample
indices = self._categorical.sample(sample_shape)
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/distributions/categorical.py", line 133, in sample
return samples_2d.reshape(self._extended_shape(sample_shape))
File "/home/sam/dev/sheeprl/.venv/lib/python3.10/site-packages/torch/distributions/distribution.py", line 268, in _extended_shape
return torch.Size(sample_shape + self._batch_shape + self._event_shape)
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. |
Hi @geranim0, you need to update PyTorch to the nightly build ( |
Hi @belerico , Yes I did replace the stock torch with the nightly, yielding this config (.venv) sam@oldub:~/dev/sheeprl$ python -m torch.utils.collect_env
/usr/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Collecting environment information...
PyTorch version: 2.4.0.dev20240418+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-27-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070
Nvidia driver version: 545.29.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
CPU family: 6
Model: 60
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
BogoMIPS: 7000.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization: VT-x
L1d cache: 128 KiB (4 instances)
L1i cache: 128 KiB (4 instances)
L2 cache: 1 MiB (4 instances)
L3 cache: 6 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Unknown: No mitigations
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Mitigation; Microcode
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-lightning==2.2.2
[pip3] pytorch-triton==3.0.0+989adb9a29
[pip3] torch==2.4.0.dev20240418+cu121
[pip3] torchaudio==2.2.0.dev20240418+cu121
[pip3] torchmetrics==1.3.2
[pip3] torchvision==0.19.0.dev20240418+cu121
[pip3] triton==2.2.0
[conda] Could not collect |
I cannot sheeprl-eval my trained model, since the keys in the world model's state_dict have different names: Stacktrace
Error executing job with overrides: ['checkpoint_path=/home/drt/Desktop/sheeprl/sheeprl/logs/runs/dreamer_v3/PyFlyt/2024-06-23_19-34-31_dreamer_v3_PyFlyt_42/version_0/checkpoint/ckpt_730000_0.ckpt', 'fabric.accelerator=gpu', 'env.capture_video=True', 'seed=52'] |
Hi everyone,
in this branch one can use
torch.compile
to compile the Dreamer-V3 agent. In particular:sheeprl/configs/algo/dreamer_v3.yaml
one can decide what to compile and which arguments to the compile funtion to passdynamic_learning
,behaviour_learning
andcompute_lambda_values
functions as well as every models in the build_agenttorch>=2.3
because thetorch.nn.functional.one_hot
function has been fixed in the nightly buildThose are the results I've obtained:
The command I run is:
My env is:
Everyone is welcome to test it out and run some experiments with Dreamer-V3 or any other algorithm (taking inspiration from the Dreamer-V3 agent).
We can keep this issue as a reference.
Thank you all!
The text was updated successfully, but these errors were encountered: