-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge upstream #4
Commits on Jun 18, 2024
-
fix training on gke tutorial (#706)
* add -y for apt-get install to prevent image build failure * sign * sign * sign * fix tensorflow_mnist_train_distributed tutorial
Configuration menu - View commit details
-
Copy full SHA for 9b6e135 - Browse repository at this point
Copy the full SHA 9b6e135View commit details -
Configuration menu - View commit details
-
Copy full SHA for 01630c6 - Browse repository at this point
Copy the full SHA 01630c6View commit details
Commits on Jun 21, 2024
-
Update Kueue's manifests to 0.7.0 (#707)
* Update DWS-examples README to use latest Kueue version, delete Kueue manifest file * Update tutorials-and-examples/workflow-orchestration/dws-examples/README.md Co-authored-by: Aldo Culquicondor <[email protected]> * Add link to Kueue's installation documentation --------- Co-authored-by: Aldo Culquicondor <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 59ca0cc - Browse repository at this point
Copy the full SHA 59ca0ccView commit details
Commits on Jun 24, 2024
-
[TPU Provisioner] Add admission label selectors and e2e test script (#…
…702) * Add admission label selectors and e2e test script * Edit log msgs
Configuration menu - View commit details
-
Copy full SHA for fd738a6 - Browse repository at this point
Copy the full SHA fd738a6View commit details -
Adding CSV output pushed by the Locust master (#705)
* Adding CSV output pushed by the Locust master * Piping in output bucket * removing gcs bucket
Configuration menu - View commit details
-
Copy full SHA for 5e36733 - Browse repository at this point
Copy the full SHA 5e36733View commit details
Commits on Jun 27, 2024
-
Move Ray TPU Webhook out of applications/ray folder (#603)
* Move webhook folder to ray-on-gke/tpu * Change webhook path
Configuration menu - View commit details
-
Copy full SHA for cdb52c8 - Browse repository at this point
Copy the full SHA cdb52c8View commit details
Commits on Jul 1, 2024
-
Prometheus Adapter Module (#716)
* first commit * remove terraform.tfvar * refactoring * remove hardcoding * remove bash_equivalent
Configuration menu - View commit details
-
Copy full SHA for fbda166 - Browse repository at this point
Copy the full SHA fbda166View commit details -
Custom Metrics Stackdriver Adapter Module (#718)
* first commit * Update README.md
Configuration menu - View commit details
-
Copy full SHA for 19dad67 - Browse repository at this point
Copy the full SHA 19dad67View commit details
Commits on Jul 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1892f97 - Browse repository at this point
Copy the full SHA 1892f97View commit details
Commits on Jul 3, 2024
-
ml-platform release from development branch (#715)
* Updated terraform providers * Standardized GitOps scripts and added Kueue * Added initial test harness * Added h100 DWS node pool * Add notebook packaging guide to docs (#690) add notebook packaging guide * Added enhancements to the dataprocessing use cases * Updated Kueue to use the 0.7.0 manifests * Increased the cluster resource limits * Added products and features outline * Added Secret Manager add-on to the cluster * Changed configsync git respository name to allow for easier use of multiple environments * Added a GitLab project module * Standardized git variables to support GitHub or GitLab * Added a100 40GB node pools * Moved cpu node pool from n2 to n4 machines * Add environment_name to the Ray dashboard endpoint * Removed fleet level configmanagement and Google service accounts for each namespace to allow for multiple environments in a single project * Added Config Controller Terraform module * Added NVIDIA DCGM * Added allow KubeRay Operator to the namespace network policy --------- Co-authored-by: Kent Hua <[email protected]> Co-authored-by: Jun Sheng <[email protected]> Co-authored-by: Ishmeet Mehta <[email protected]> Co-authored-by: Kavitha Rajendran <[email protected]> Co-authored-by: kenthua <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0d7231d - Browse repository at this point
Copy the full SHA 0d7231dView commit details
Commits on Jul 9, 2024
-
Jetstream Maxtext Module (#719)
* first commit * terraform fmt * Update README.md * prometheus adapter module in main * remove apply.sh * typo * terraform fmt * large cleanup and validation * moved fields and made module variables consistent with example variables * parameterized accelerator selectors * parameterize metrics scrape interval * fmt * fmt * load parameters parameterization and multiple hpa resources * fmt * parameterized model name * update readme and validators * changes to jetstream module deployment readme * terraform fmt * accelerator_memory_used_percentage -> memory_used_percentage * changes to READMEs * tweaks * metrics port optional * sample tfvars no longer includes autoscaling config * example autoscaling config * Update README.md * Update README.md * Update README.md * strengthen hpa config validation * More updates to readmes * tweak to readme * typo * missing kubectl apply * typos
Configuration menu - View commit details
-
Copy full SHA for f2883eb - Browse repository at this point
Copy the full SHA f2883ebView commit details
Commits on Jul 10, 2024
-
Enable Ray Autoscaler for the Rag example application (#722)
* Enable Ray Autoscaler for the Rag example application * Update the ray application template
Configuration menu - View commit details
-
Copy full SHA for 2bfbcd7 - Browse repository at this point
Copy the full SHA 2bfbcd7View commit details
Commits on Jul 11, 2024
-
update image url for gemma finetune yaml (#729)
* Update finetune.yaml
Configuration menu - View commit details
-
Copy full SHA for ac2c2ff - Browse repository at this point
Copy the full SHA ac2c2ffView commit details -
Add HuggingFace support for automated inference checkpoint conversion (…
…#712) * Add HuggingFace support for automated inference checkpoint conversion * Add HuggingFace support for inference checkpoint conversion * fix llama checkpoint names * update containers to v0.2.3 / v0.2.2 * update containers to v0.2.3 / v0.2.2
Configuration menu - View commit details
-
Copy full SHA for 0fd14fd - Browse repository at this point
Copy the full SHA 0fd14fdView commit details