Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I submit tf-job in armada? #127

Open
sync-by-unito bot opened this issue Apr 7, 2023 · 5 comments
Open

How can I submit tf-job in armada? #127

sync-by-unito bot opened this issue Apr 7, 2023 · 5 comments

Comments

@sync-by-unito
Copy link

sync-by-unito bot commented Apr 7, 2023

After I test the example in https://github.com/G-Research/armada/blob/master/example/jobs.yaml

Is there any methods to submit tf job? The object job isn't used frequently. The job of Tensorflow or Pytorch is widely used. https://github.com/kubeflow/tf-operator/blob/master/examples/v1/dist-mnist/tf_job_mnist.yaml

┆Issue is synchronized with this Jira Task by Unito

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Apr 7, 2023

➤ Alex Wang commented:

/help

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Apr 7, 2023

➤ jankaspar commented:

Hi, Armada currently does not support TF Jobs, the closest to it you can get is job with multiple podSpecs, its possible to submit jobs like this:

queue: test
jobSetId: job-set-1
jobs:

  • priority: 0
    podSpecs:
  • containers:
    ...
    resources:
    ...
  • containers:
    ...
    resources:
    ...Armada will aim to schedule multiple pods in one of the clusters at the same time.

Adding support of custom jobs specifications like TF Jobs and other kubeflow types (https://www.kubeflow.org/docs/components/training/) is something we have considered, but have not implemented yet.

Are you planning to use Armada for specific use case?

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Apr 7, 2023

➤ Alex Wang commented:

Hi, Armada currently does not support TF Jobs, the closest to it you can get is job with multiple podSpecs, its possible to submit jobs like this:

But if I set the tf job with multiple podSpecs, How can tf-operator operate it?

Are you planning to use Armada for specific use case?

Yes, But our job is mainly AI/BigData like tfjob\Pytorch\spark

Armada will aim to schedule multiple pods in one of the clusters at the same time.
Adding support of custom jobs specifications like TF Jobs and other kubeflow types (https://www.kubeflow.org/docs/components/training/) is something we have considered, but have not implemented yet.

I'm looking forward to this feature. Do you have a general schedule?

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Apr 7, 2023

➤ Alex Wang commented:

jankaspar Thanks

@sync-by-unito
Copy link
Author

sync-by-unito bot commented Apr 7, 2023

➤ jankaspar commented:

Hi, sorry for late reply,
We don't have any schedule for the support of additional job types. But we would be happy to accept any PRs.

You are right, its not possible to use Tensor Flow operator, but you can use Tensor Flow in the multi node jobs without the operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants