You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for this beautiful library. Enjoying it for years!
The last days I wanted to use the multirun command on an AWS cluster. And I noticed that you provide a ray launcher: https://hydra.cc/docs/plugins/ray_launcher/
I started with the "simple app" example from the documentation (https://github.com/facebookresearch/hydra/tree/main/plugins/hydra_ray_launcher/examples/simple). Running it out of the box, resulted in an error jsonschema.exceptions.ValidationError: Additional properties are not allowed ('autoscaling_mode', 'initial_workers', 'target_utilization_fraction' were unexpected). See the full output under Stack trace/error message.
So at that point I thought of creating my own custom ray aws config and gave it a try to comment out the additional properties. To follow along and reproduce my steps I created a little repo: https://github.com/philkohl/hydra-ray-aws-example
With this workaround I was able to start a ray head node. But I was not able to submit the tasks due to an import error ImportError: attempted relative import with no known parent package. For details see the second stack trace below.
Is there something wrong in my config or is there an issue in the plugin?
EDIT:
I think I found a problem in my config for creating the python environment. I pushed a change to my repo. But I still face the import problem.
Therefore, I tested a workaround to replace all relative import to absolute imports via package notation in my site-packages for hydra_ray_launcher. E.g.:
from hydra_plugins.hydra_ray_launcher._launcher_util import (
JOB_RETURN_PICKLE,
JOB_SPEC_PICKLE,
launch_job_on_ray,
start_ray,
)
instead of
from ._launcher_util import (
JOB_RETURN_PICKLE,
JOB_SPEC_PICKLE,
launch_job_on_ray,
start_ray,
)
[2024-07-21 10:43:47,244][HYDRA] Ray Launcher is launching 3 jobs,
[2024-07-21 10:43:47,244][HYDRA] #0 : task=1
[2024-07-21 10:43:47,319][HYDRA] #1 : task=2
[2024-07-21 10:43:47,391][HYDRA] #2 : task=3
[2024-07-21 10:43:47,469][HYDRA] Pickle for jobs: /tmp/tmp6574t01r/job_spec.pkl
Cluster: default
2024-07-21 10:43:47,480 INFO util.py:382 -- setting max workers for head node type to 0
Checking AWS environment settings
Traceback (most recent call last):
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
lambda: hydra.multirun(
^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 162, in multirun
ret = sweeper.sweep(arguments=task_overrides)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 177, in sweep
results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/ray_aws_launcher.py", line 62, in launch
return _core_aws.launch(
^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/_core_aws.py", line 106, in launch
return launch_jobs(
^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/_core_aws.py", line 117, in launch_jobs
sdk.create_or_update_cluster(
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/sdk/sdk.py", line 38, in create_or_update_cluster
return commands.create_or_update_cluster(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/commands.py", line 314, in create_or_update_cluster
config = _bootstrap_config(config, no_config_cache=no_config_cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/commands.py", line 408, in _bootstrap_config
validate_config(config)
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/util.py", line 162, in validate_config
raise e from None
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/util.py", line 160, in validate_config
jsonschema.validate(config, schema)
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/jsonschema/validators.py", line 1332, in validate
raise error
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('autoscaling_mode', 'initial_workers', 'target_utilization_fraction' were unexpected)
Traceback (most recent call last):
File "/tmp/tmp.tTUvsfPQn6/_remote_invoke.py", line 18, in <module>
from ._launcher_util import (
ImportError: attempted relative import with no known parent package
Shared connection to 3.79.57.96 closed.
Traceback (most recent call last):
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
lambda: hydra.multirun(
^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 162, in multirun
ret = sweeper.sweep(arguments=task_overrides)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 177, in sweep
results = self.launcher.launch(batch, initial_job_idx=initial_job_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/ray_aws_launcher.py", line 62, in launch
return _core_aws.launch(
^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/_core_aws.py", line 106, in launch
return launch_jobs(
^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib/python3.12/site-packages/hydra_plugins/hydra_ray_launcher/_core_aws.py", line 154, in launch_jobs
sdk.run_on_cluster(
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/sdk/sdk.py", line 109, in run_on_cluster
return commands.exec_cluster(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/commands.py", line 1167, in exec_cluster
result = _exec(
^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/commands.py", line 1233, in _exec
return updater.cmd_runner.run(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/command_runner.py", line 383, in run
return self._run_helper(final_cmd, with_output, exit_on_fail, silent=silent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/philipp/Repositories/hydra-aws-example/.venv/lib64/python3.12/site-packages/ray/autoscaler/_private/command_runner.py", line 291, in _run_helper
raise click.ClickException(
click.exceptions.ClickException: Command failed:
ssh -tt -i /home/philipp/.ssh/hydra-philipp.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_7aa2b466ee/7505d64a54/%C -o ControlPersist=10s -o ConnectTimeout=120s [email protected] bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python /tmp/tmp.tTUvsfPQn6/_remote_invoke.py /tmp/tmp.tTUvsfPQn6)'
Expected Behavior
Spin up Ray cluster
Submit tasks
See a kind of this output like in the documentation
$ python my_app.py --multirun task=1,2,3
[HYDRA] Ray Launcher is launching 3 jobs,
[HYDRA] #0 : task=1
[HYDRA] #1 : task=2
[HYDRA] #2 : task=3
[HYDRA] Pickle for jobs: /var/folders/n_/9qzct77j68j6n9lh0lw3vjqcn96zxl/T/tmpqqg4v4i7/job_spec.pkl
Cluster: default
...
INFO services.py:1172 -- View the Ray dashboard at http://localhost:8265
(pid=3374) [__main__][INFO] - Executing task 1
(pid=3374) [__main__][INFO] - Executing task 2
(pid=3374) [__main__][INFO] - Executing task 3
...
[HYDRA] Stopping cluster now. (stop_cluster=true)
[HYDRA] Deleted the cluster (provider.cache_stopped_nodes=false)
Destroying cluster. Confirm [y/N]: y [automatic, due to --yes]
...
No nodes remaining.
System information
Hydra Version : 1.3.2
Python version : 3.9 / 3.11 / 3.12
Virtual environment type and version : Poetry
Operating system : Linux (Fedora 40)
The text was updated successfully, but these errors were encountered:
I am facing the same issue with the ray aws example provided in the repo. @omry, could you please give any suggestions/help
Thank you!
raise error
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('autoscaling_mode', 'initial_workers', 'target_utilization_fraction' were unexpected)
hydra-core 1.3.2
hydra-ray-launcher 1.2.1
ray 2.38.0
🐛 Bug
Description
Hi,
thank you for this beautiful library. Enjoying it for years!
The last days I wanted to use the multirun command on an AWS cluster. And I noticed that you provide a ray launcher: https://hydra.cc/docs/plugins/ray_launcher/
I started with the "simple app" example from the documentation (https://github.com/facebookresearch/hydra/tree/main/plugins/hydra_ray_launcher/examples/simple). Running it out of the box, resulted in an error
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('autoscaling_mode', 'initial_workers', 'target_utilization_fraction' were unexpected)
. See the full output underStack trace/error message
.So at that point I thought of creating my own custom ray aws config and gave it a try to comment out the additional properties. To follow along and reproduce my steps I created a little repo: https://github.com/philkohl/hydra-ray-aws-example
With this workaround I was able to start a ray head node. But I was not able to submit the tasks due to an import error
ImportError: attempted relative import with no known parent package
. For details see the second stack trace below.Is there something wrong in my config or is there an issue in the plugin?
EDIT:
I think I found a problem in my config for creating the python environment. I pushed a change to my repo. But I still face the import problem.
Therefore, I tested a workaround to replace all relative import to absolute imports via package notation in my site-packages for hydra_ray_launcher. E.g.:
instead of
With this change it seems to work.
Checklist
To reproduce
** Minimal Code/Config snippet to reproduce **
** Stack trace/error message **
Expected Behavior
System information
The text was updated successfully, but these errors were encountered: