Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugs when running the training code #12

Open
zczlsde opened this issue Sep 11, 2023 · 4 comments
Open

bugs when running the training code #12

zczlsde opened this issue Sep 11, 2023 · 4 comments

Comments

@zczlsde
Copy link

zczlsde commented Sep 11, 2023

2023-09-11 10:21:49,337 ERROR tune_controller.py:911 -- Trial task failed for trial PPO_meltingpot_fcb07_00000
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 2495, in get
raise value
File "python/ray/_raylet.pyx", line 1787, in ray._raylet.task_execution_handler
File "python/ray/_raylet.pyx", line 1684, in ray._raylet.execute_task_with_cancellation_handler
File "python/ray/_raylet.pyx", line 1366, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1367, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 1583, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 864, in ray._raylet.store_task_errors
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::PPO.init() (pid=13902, ip=172.28.0.12, actor_id=7e027fca141b6dc2cdd8f15501000000, repr=PPO)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 517, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/ray/tune/trainable/trainable.py", line 169, in init
self.setup(copy.deepcopy(self.config))
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup
self.workers = WorkerSet(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/worker_set.py", line 179, in init
raise e.args[0].args[2]
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in init
self._update_policy_map(policy_dict=self.policy_dict)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
self._build_policy_map(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
new_policy = create_policy_for_framework(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 64, in init
self._initialize_loss_from_dummy_batch()
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/policy.py", line 1418, in _initialize_loss_from_dummy_batch
actions, state_outs, extra_outs = self.compute_actions_from_input_dict(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/torch_policy_v2.py", line 571, in compute_actions_from_input_dict
return self._compute_action_helper(
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/utils/threading.py", line 24, in wrapper
return func(self, *a, **k)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/policy/torch_policy_v2.py", line 1291, in _compute_action_helper
dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/modelv2.py", line 259, in call
res = self.forward(restored, state or [], seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 259, in forward
return super().forward(input_dict, state, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 98, in forward
output, new_state = self.forward_rnn(inputs, state, seq_lens)
File "/usr/local/lib/python3.10/dist-packages/ray/rllib/models/torch/recurrent_net.py", line 274, in forward_rnn
self._features, [h, c] = self.lstm(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 810, in forward
self.check_forward_args(input, hx, batch_sizes)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 730, in check_forward_args
self.check_input(input, batch_sizes)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py", line 218, in check_input
raise RuntimeError(
RuntimeError: input.size(-1) must be equal to input_size. Expected 148, got 24

@zczlsde
Copy link
Author

zczlsde commented Sep 11, 2023

Does this caused by the version of the dependencies

@thackerhelik
Copy link

Not sure but this might help in the meanwhile, #5 (comment)

@zczlsde
Copy link
Author

zczlsde commented Sep 12, 2023

When I install the packages it gives me the following error (I run this in colab):

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.3 which is incompatible.
tensorflow-datasets 4.9.2 requires protobuf>=3.20, but you have protobuf 3.19.6 which is incompatible.
tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 3.19.6 which is incompatible.

@rstrivedi
Copy link
Owner

rstrivedi commented Sep 13, 2023

These problems are not relevant to the installation of the provided dependencies. They seem to be stemming from other things installed in your environment. Could you try to use a virtual env instead of using system python or try to create a fresh virtual environment and check if it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants