Skip to content

Releases: GoogleCloudPlatform/ai-infra-cluster-provisioning

GKE cluster and Node Pool support.

18 Apr 23:33
51120bc
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0: Ops agent for GPU metric, DLVM bug fix and other bug fixes

10 Mar 16:22
a8c56fd
Compare
Choose a tag to compare

What's Changed

  • Adding HPC toolkit blueprint to use aiinfra cluster provisioning tool. by @soumyapani in #74
  • add scopes to default service account in startup script by @stevenBorisko in #77
  • Enabling internet access only for primary network in multi-NIC VPC. by @soumyapani in #78
  • New Ops agent installation for GPU metric and corresponding Cloud Monitoring Dashboard by @stevenBorisko in #82
  • Temporary disable GVNIC since DLVM images do not support GVNIC. by @soumyapani in #90
  • Making orchestrator configurable.
  • Adding disable_notebook flag. by @soumyapani in #94
  • Release v0.4.0 by @soumyapani in #96

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0: migration to pure terraform, support for minimal verbosity and disable ops agent installation

15 Feb 22:35
5605cf9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.3.0

v0.2.0: converged networking module and minimal terraform verbosity support

09 Feb 00:06
44dd63d
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.0...v0.2.0

v0.1.0: MIG with Multi-NIC, GCSFuse and Fileshare support

24 Jan 01:14
7e9f385
Compare
Choose a tag to compare

What's Changed

    1. Adding GCB file for PR validation. 2. Bug: 260149974: Adding support for passing Action via command line. by @soumyapani in #2
  • Adding roadmap file for cluster provisioning. by @soumyapani in #13
  • Prompt for cluster connection and other enhancements. by @soumyapani in #24
  • 1.Removing CLEANUP_ON_EXIT behavior 2. Use single gcs bucket per project for storing state by @soumyapani in #25
  • Fixing copy directory usage and updating README.md by @soumyapani in #29
  • Update README.md by @DmitryKakurin in #32
  • Adding example training script. by @soumyapani in #30
    1. fixing continuous GCB config.2. Adding documentation for storage object owner access. by @soumyapani in #36
  • Fixing GCB config for PR. by @soumyapani in #37
  • Fixing datamodel for tensorflow script. Adding resnet training example with ray for pytorch. by @soumyapani in #38
  • GCB and Debugging Improvements. by @soumyapani in #39
  • Updating test env.list file for PR. by @soumyapani in #41
  • Using HPC toolkit modules. by @soumyapani in #43
  • Adding support for GCS mount. by @soumyapani in #45
  • Fixing Dockerfile to use the right base image for gcloud. by @soumyapani in #47
  • Removing local copy of startup-script module and using HPC module. by @soumyapani in #48
  • Adding multi-nic support for MIG by @soumyapani in #50
  • Adding support for NFS fileshare. by @soumyapani in #51
  • Release v0.1.0 by @soumyapani in #52

New Contributors

Full Changelog: https://github.com/GoogleCloudPlatform/ai-infra-cluster-provisioning/commits/v0.1.0