Ray-1.4.0
Release 1.4.0 Notes
Ray Autoscaler
🎉 New Features:
- Support Helm Chart for deploying Ray on Kubernetes
- Key Autoscaler metrics are now exported via Prometheus!
💫Enhancements
- Better error messages when a node fails to come online
🔨 Fixes:
- Stability and interface fixes for Kubernetes deployments.
- Fixes to Azure NodeProvider
Ray Client
🎉 New Features:
- Complete API parity with non-client mode
- Experimental ClientBuilder API (docs here)
- Full Asyncio support
💫Enhancements
- Keep Alive for Messages for long lived connections
- Improved pickling error messages
🔨 Fixes:
- Client Disconnect can be called multiple times
- Client Reference Equality Check
- Many bug fixes and tests for the complete ray API!
Ray Core
🎉 New Features:
- Namespaces (check out the docs)! Note: this may be a breaking change if you’re using detached actors (set ray.init(namespace=””) for backwards compatible behavior).
🔨 Fixes:
- Support increment by arbitrary number with ray.util.metrics.Counter
- Various bug fixes for the placement group APIs including the GPU assignment bug (#15049).
🏗 Architecture refactoring:
- Increase the efficiency and robustness of resource reporting
Ray Data Processing
🔨 Fixes:
- Various bug fixes for better stability (#16063, #14821, #15669, #15757, #15431, #15426, #15034, #15071, #15070, #15008, #15955)
- Fixed a critical bug where the driver uses excessive memory usage when there are many objects in the cluster (#14322).
- Dask on Ray and Modin can now be run with Ray client
🏗 Architecture refactoring:
- Ray 100TB shuffle results: #15770
- More robust memory management subsystem is in progress (#15157, #15027)
RLlib
🎉 New Features:
- PyTorch multi-GPU support (#14709, #15492, #15421).
- CQL TensorFlow support (#15841).
- Task-settable Env/Curriculum Learning API (#15740).
- Support for native tf.keras Models (no ModelV2 required) (#14684, #15273).
- Trainer.train() and Trainer.evaluate() can run in parallel (optional) (#15040, #15345).
💫Enhancements and documentation:
- CQL: Bug fixes and confirmed MuJoCo benchmarks (#15814, #15603, #15761).
- Example for differentiable neural computer (DNC) network (#14844, 15939).
- Added support for int-Box action spaces. (#15012)
- DDPG/TD3/A[23]C/MARWIL/BC: Code cleanup and type annotations. (#14707).
- Example script for restoring 1 agent out of n
- Examples for fractional GPU usage. (15334)
- Enhanced documentation page describing example scripts and blog posts (15763).
- Various enhancements/test coverage improvements: 15499, 15454, 15335, 14865, 15525, 15290, 15611, 14801, 14903, 15735, 15631,
🔨 Fixes:
- Memory Leak in multi-agent environment (#15815). Shoutout to Bam4d!
- DDPG PyTorch GPU bug. (#16133)
- Simple optimizer should not be used by default for tf+MA (#15365)
- Various bug fixes: #15762, 14843, 15042, 15427, 15871, 15132, 14840, 14386, 15014, 14737, 15015, 15733, 15737, 15736, 15898, 16118, 15020, 15218, 15451, 15538, 15610, 15326, 15295, 15762, 15436, 15558, 15937
🏗 Architecture refactoring:
- Remove atari dependency (#15292).
Trainer._evaluate()
renamed toTrainer.evaluate()
(backward compatible);Trainer.evaluate()
can be called even w/o evaluation worker set, ifcreate_env_on_driver=True
(#15591).
Tune
🎉 New Features:
- ASHA scheduler now supports save/restore. (#15438)
- Add HEBO to search algorithm shim function (#15468)
- Add SkoptSearcher/Bayesopt Searcher restore functionality (#15075)
💫Enhancements:
- We now document scalability best practices (k8s, scalability thresholds). You can find this here (#14566)
- You can now set the result buffer_length via tune.run - this helps with trials that report too frequently. (#15810)
- Support numpy types in TBXlogger (#15760)
- Add
max_concurrent
option to BasicVariantGenerator (#15680) - Add
seed
parameter to OptunaSearch (#15248) - Improve BOHB/ConfigSpace dependency check (#15064)
🔨Fixes:
- Reduce default number of maximum pending trials to max(16, cluster_cpus) (#15628)
- Return normalized checkpoint path (#15296)
- Escape paths before globbing in TrainableUtil.get_checkpoints_paths (#15368)
- Optuna Searcher: Set correct Optuna TrialState on trial complete (#15283)
- Fix type annotation in tune.choice (#15038)
- Avoid system exit error by using
del
when cleaning up actors (#15687)
Serve
🎉 New Features:
- As of Ray 1.4, Serve has a new API centered around the concept of “Deployments.” Deployments offer a more streamlined API and can be declaratively updated, which should improve both development and production workflows. The existing APIs have not changed from Ray 1.4 and will continue to work until Ray 1.5, at which point they will be removed (see the package reference if you’re not sure about a specific API). Please see the migration guide for details on how to update your existing Serve application to use this new API.
- New
serve.deployment
API:@serve.deployment, serve.get_deployments, serve.list_deployments
(#14935, #15172, #15124, #15121, #14953, #15152, #15821) - New
serve.ingress(fastapi_app)
API (#15445, 15441, 14858) - New
@serve.batch
decorator in favor of legacymax_batch_size
in backend config (#15065) serve.start()
is now idempotent (#15148)- Added support for
handle.method_name.remote()
(#14831)
🔨Fixes:
- Rolling updates for redeployments (#14803)
- Latency improvement by using pickle (#15945)
- Controller and HTTP proxy uses
num_cpus=0
by default (#15000) - Health checking in the controller instead of using
max_restarts
(#15047) - Use longest prefix matching for path routing (#15041)
Dashboard
🎉New Features:
- Experimental OpenTelemetry support. (#16028,#14872,#15742).
🔨Fixes:
- Add object store memory column (#15697)
- Add object store stats to dashboard API. (#15677)
- Remove disk data from the dashboard when running on K8s. (#14676)
- Fix reported dashboard ip when using 0.0.0.0 (#15506)
Thanks
Many thanks to all those who contributed to this release!
@clay4444, @Fabien-Couthouis, @mGalarnyk, @smorad, @ckw017, @ericl, @antoine-galataud, @pleiadesian, @DmitriGekhtman, @robertnishihara, @Bam4d, @fyrestone, @stephanie-wang, @kfstorm, @wuisawesome, @rkooo567, @franklsf95, @micahtyong, @WangTaoTheTonic, @krfricke, @hegdeashwin, @devin-petersohn, @qicosmos, @edoakes, @llan-ml, @ijrsvt, @richardliaw, @Sertingolix, @ffbin, @simjay, @AmeerHajAli, @simon-mo, @tom-doerr, @sven1977, @clarkzinzow, @mxz96102, @SebastianBo1995, @amogkam, @iycheng, @sumanthratna, @Catch-Bull, @pcmoritz, @architkulkarni, @stefanbschneider, @tgaddair, @xcharleslin, @cthoyt, @fcardoso75, @Jeffwan, @mvindiola1, @michaelzhiluo, @rlan, @mwtian, @SongGuyang, @YeahNew, @kathryn-zhou, @rfali, @jennakwon06, @Yeachan-Heo