Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why clearml agent worker ignores PYTHONPATH and CLEARML_AGENT_EXTRA_PYTHON_PATH? #220

Open
gzzv opened this issue Dec 12, 2024 · 8 comments

Comments

@gzzv
Copy link

gzzv commented Dec 12, 2024

I try to run remote clearml agent in k8s cluster. I have several python packages which are located in the different paths. This paths are specified in PYTHONPATH, but agent worker can't import this packages.
I also tried to add path to the CLEARML_AGENT_EXTRA_PYTHON_PATH, but again nothing. Can anyone help me and give me some advice?

@gzzv
Copy link
Author

gzzv commented Dec 12, 2024

UPD
I run test script with importing this packages from worker pod container manually and it works (i used the same python interpreter as the worker

@jkhenning
Copy link
Member

Hi @gzzv,

How are you passing the PYTHONPATH and the other env var? Can you include logs for this failure?

@gzzv
Copy link
Author

gzzv commented Dec 13, 2024

Hi @jkhenning!

  • PYTHONPATH is defined in base docker image, CLEARML_AGENT_EXTRA_PYTHON_PATH - in values.yaml file (basePodTemplate.env field).
  • I've tested my script just in docker container and it works, but doesn't work in clearml agent pod.
  • Also I've tried to pass command like this: python3 -c 'import example_lib as ex; print(ex)' as agent.extra_docker_shell_script to clearml.clearmlConfig field in values.yaml and again it doesn't work

@gzzv
Copy link
Author

gzzv commented Dec 13, 2024

My values.yaml:

imageCredentials:
  enabled: true
  existingSecret: regcredcloud
clearml:
  existingAgentk8sglueSecret: agentk8sglue
agentk8sglue:
  apiServerUrlReference: "https://example.com/"
  fileServerUrlReference: "https://example.com/"
  webServerUrlReference: "https://example.com/"
  defaultContainerImage: test_image:0.0.1
  queue: Processing
basePodTemplate:
  env:
    - name: CLEARML_AGENT_SKIP_CONTAINER_APT
      value: 'true'
    - name: CLEARML_AGENT_NO_UPDATE
      value: '1'
    - name: CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL
      value: '1'
  tolerations:
    - key: "nvidia.com/gpu"
      operator: Exists
      effect: "NoSchedule"
  resources:
    limits:
      nvidia.com/gpu: "1"
    requests:
      nvidia.com/gpu: "1"

@gzzv
Copy link
Author

gzzv commented Dec 13, 2024

Logs:

Using environment access key CLEARML_API_ACCESS_KEY=5X487RWJ4KAW3OB8AUS1OLFW062QDV
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.9.2, location: /tmp/.clearml_agent.9_sf_1el.cfg):
----------------------
*****Long clearml-config description******

Executing task id [12572f62ea75456daa3128419a735a4a]:
repository = 
branch = 
version_num = 
tag = 
docker_cmd = swr.ru-moscow-1.hc.sbercloud.ru/srobotics-cheburator/cheburator:x86.0.0.31
entry_point = task_remote_run.py
working_dir = .
Running task id [12572f62ea75456daa3128419a735a4a]:
[.]$ /usr/bin/python3.10 -u /root/.clearml/venvs-builds/code/task_remote_run.py
Summary - installed python packages:
*****Long pip summary******

Environment setup completed successfully
Starting Task Execution:
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/code/task_remote_run.py", line 19, in <module>
    from cheburator_description.description_parameters import DescriptionParametersNoROS
ModuleNotFoundError: No module named 'cheburator_description'

@gzzv
Copy link
Author

gzzv commented Dec 13, 2024

UPD
These problematic packages are not presented in logs

@gzzv
Copy link
Author

gzzv commented Dec 13, 2024

I've manually added PYTHONPATH with value to values.yaml file and it works
But I don't know why PYTHONPATH is ignored when pod with worker starts
UPD: i've noticed that worker ignores all envs which are defined in based image during start

@jkhenning
Copy link
Member

I would assume that this is because the PYTHONPATH env var is somehow added to bashrc or something that's loaded when you "manually" execute into the container, but for some reason not when the agent command is executed inside that container...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants