Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to Authenticate to the W&B Server, from the Execution Environment Running a Job? #31

Open
sraza-onshape opened this issue Aug 14, 2023 · 0 comments

Comments

@sraza-onshape
Copy link

sraza-onshape commented Aug 14, 2023

Hello all 👋🏽, I'm super new to W&B in general. I just wanted to make sure I understand how exactly Launch works after reading the docs, and how it can help my team's use case, which is the following:

  • Use Case: "As a ML engineer, I don't want to always run the training process for our model on my local CPU. I can make changes to the ML code, and then send it to the Azure Kubernetes Service (AKS), to run the training on fast GPUs; and then, see the training metrics whenever I log back into my W&B dashboard (for now, let's say I'm on the SaaS offering)."

Using W&B terminology, I think what I'd need to do to implement this is the following:

  1. Create a job:
    1. write the code for our model
    2. Containerize it in Docker, and send the Docker image to Azure (probably to the Azure Container Registry, aka ACR)
    3. First question: at this point, in which environment should I be setting the WANDB_DOCKER variable?
  2. Create a queue:
    1. Second question: If I go through the UI to create a queue, I notice that under the "Resource" dropdown, there's no option called "Azure". Would that pose an issue for us, or is it safe for us to just go with the "Docker" option?
  3. Start a launch agent:
    1. As I understand it, once a job is dequeued from the queue, then it's the launch agent's responsibility to move it into the appropriate execution environment.
    2. Third question: If that is the case, I'm still scratching my head as to how the machine running my K8s job will be able to authenticate, so it can send training metrics to W&B. In this situation, what's the most secure way for us to set the WANDB_API_KEY environment variable in the Docker container, that gets run inside of AKS? One idea I thought of is to include it under the environment property when creating the agent configuration, but that's just a guess...

Thanks in advance for your insights on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant