Skip to content

Releases: kubernetes-sigs/jobset

v0.6.0

20 Aug 16:20
d66f1d5
Compare
Choose a tag to compare

Highlights

  • New JobSet Failure Policy API - allows users to configure different behavior for different types of errors, enabling them to use compute resources more efficiently and improve ML training goodput.
  • Add Coordinator field to JobSet spec, enabling user to define a global coordinator pod for distributed ML/HPC workloads. The stable network endpoint for this pod will be added as a label and annotation to every Job and Pod in the JobSet for easy use in application code. A common use case for this is TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Add global Job index label/annotation to every Job and Pod, which is needed to support TPU Multislice training with multiple different Job templates. See linked issue for details.
  • Added new metrics
  • Improved test coverage
  • Bug fixes
  • New examples and documentation

What's Changed

New Contributors

Full Changelog: v0.6.0-devel...v0.6.0

JobSet v0.5.2

04 Jun 17:42
8637f29
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.1...v0.5.2

v0.5.1

09 May 17:50
43f8137
Compare
Choose a tag to compare

Highlights

  • Fixed bug causing foreground cascading deletion policy to not work properly on JobSets #562
  • Fixed field path in error message in validation for ManagedBy field #527
  • Test coverage improvements, refactoring, additional documentation

What's Changed

Full Changelog: v0.6.0-devel...v0.5.1

v0.5.0

15 Apr 20:12
cb941fc
Compare
Choose a tag to compare

What's Changed

Highlights

  • JobSet TTL support added in #443
  • Docsite is live at https://jobset.sigs.k8s.io/ with updated documentation and examples.
  • Include first failed job name in event emitted when JobSet fails, to speed up the debugging process for large complex workloads #477
  • Lower default resource request for JobSet controller manager so it fits on default cloud CPU VMs, but keep high limit to support maximum performance #480
  • Perform only 1 JobSet status update per reconcile attempt to reduce pressure on k8s apiserver #494
  • Introduced MangedBy field to the JobSet spec to enable Multi-Kueue support

Detailed release notes

New Contributors

Full Changelog: v0.5.0-devel...v0.5.0

v0.4.0

28 Feb 21:12
9f2cb14
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0-devel...v0.4.0

JobSet v0.3.2

13 Feb 19:51
5eb9a2a
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.1...v0.3.2

JobSet v0.3.1

22 Dec 00:38
65abe37
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1

JobSet v0.3.0

12 Dec 23:24
05acd93
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0-devel...v0.3.0

JobSet v0.2.3

12 Sep 00:12
6efc082
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

JobSet v0.2.2

24 Aug 02:05
6380248
Compare
Choose a tag to compare

What's Changed

  • Fixed bug causing JobSet controller to never create child jobs when the indexes had previously failed to build: by @danielvegamyhre in #266

Full Changelog: v0.2.1...v0.2.2