You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unfortunately, legate-issue does not appear to be installed as part of the conda/miniconda enviornment our users have installed. I reproduced this issue myself with the latest conda environment dated as of this report and also don't see it installed. legate --version reports version 23.09.00.
Jupyter notebook / Jupyter Lab version
No response
Expected behavior
A legate-python script should execute on a single node allocation without error.
Observed behavior
The script launch fails with the following error:
$ legate simple.py
Traceback (most recent call last):
File "/projects/legion/miniconda3/bin/legate", line 7, in <module>
from legate.driver import main
File "/projects/legion/miniconda3/lib/python3.11/site-packages/legate/driver/__init__.py", line 17, in <module>
from .config import Config
File "/projects/legion/miniconda3/lib/python3.11/site-packages/legate/driver/config.py", line 35, in <module>
from .args import parser
File "/projects/legion/miniconda3/lib/python3.11/site-packages/legate/driver/args.py", line 118, in <module>
nodes_kw, ranks_per_node_kw = detect_multi_node_defaults()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/projects/legion/miniconda3/lib/python3.11/site-packages/legate/driver/args.py", line 93, in detect_multi_node_defaults
nodes_kw["default"] = nodes
^^^^^
UnboundLocalError: cannot access local variable 'nodes' where it is not associated with a value
Example code or instructions
# insert any legate friendly code here... :-) # see additional info below (this appears to be environment-specific issue). print('hello')
Stack traceback or browser console output
This is likely a local use case nuance combined with slurm job submission details. It is often the case our users request a single node allocation without explicitly providing a node count. For example,
$ salloc -p redstone --qos=normal --time=10:00:00
Within such an allocation the failure occurs for all legate-launched scripts. The work-around is for users to explicitly provide a node count (-n 1) in their salloc request:
This might have been fixed since 23.09, if you were able to get a top-of-tree build working could you please test with that?
Not related, but note that if salloc doesn't automatically log you in to (one of) the compute nodes in your allocation (that's the behavior on our local SLURM cluster), then you'd want to pass a --launcher option to legate to send the processes to the compute nodes (or ssh into a compute node manually). Without a launcher legate will just run on the currently node.
Software versions
Unfortunately,
legate-issue
does not appear to be installed as part of the conda/miniconda enviornment our users have installed. I reproduced this issue myself with the latest conda environment dated as of this report and also don't see it installed.legate --version
reports version 23.09.00.Jupyter notebook / Jupyter Lab version
No response
Expected behavior
A legate-python script should execute on a single node allocation without error.
Observed behavior
The script launch fails with the following error:
Example code or instructions
Stack traceback or browser console output
This is likely a local use case nuance combined with slurm job submission details. It is often the case our users request a single node allocation without explicitly providing a node count. For example,
$ salloc -p redstone --qos=normal --time=10:00:00
Within such an allocation the failure occurs for all legate-launched scripts. The work-around is for users to explicitly provide a node count (
-n 1
) in theirsalloc
request:$ salloc -p redstone -n 1 --qos=normal --time=10:00:00
The text was updated successfully, but these errors were encountered: