Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add e2e tests for tpu-provisioner #235

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

nstogner
Copy link
Collaborator

@nstogner nstogner commented Feb 26, 2024

Creates a cluster, installs tpu-provisioner, and then runs test JobSets against the cluster.

@nstogner nstogner changed the title WIP: Add e2e tests for tpu-provisioner Add e2e tests for tpu-provisioner Mar 8, 2024
Comment on lines +208 to +232
tpu_v4_2x2x2 = tpuConfig{
accelerator: "tpu-v4-podslice",
topoX: 2,
topoY: 2,
topoZ: 2,
chipsPerNode: 4,
sliceCount: 1,
}
tpu_v4_2x2x4 = tpuConfig{
accelerator: "tpu-v4-podslice",
topoX: 2,
topoY: 2,
topoZ: 4,
chipsPerNode: 4,
sliceCount: 1,
}

tpu_v5e_2x4 = tpuConfig{
accelerator: "tpu-v5-lite-podslice",
topoX: 2,
topoY: 4,
chipsPerNode: 4,
sliceCount: 2,
}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these commented out intentionally or was it meant to be temporarily commented out for your testing?


func newJobset(name string, c tpuConfig, uniqueNodeSelector bool) *jobset.JobSet {
nodeSelectors := map[string]string{
"cloud.google.com/gke-tpu-accelerator": c.accelerator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define the various labels in this file as constants somewhere

Comment on lines +34 to +38
var cf struct {
o sync.Once
m sync.RWMutex
f []func()
}
Copy link
Contributor

@danielvegamyhre danielvegamyhre Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable names for this struct and its fields are unclear to me,
later in the code when the variables are used it's a bit confusing what the variable is referring to.

Can we make them more descriptive?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants