Ray-1.3.0
Release v1.3.0 Notes
Highlights
- We are now testing and publishing Ray's scalability limits with each release, see: https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks
- Ray Client is now usable by default with any Ray cluster started by the Ray Cluster Launcher.
Ray Cluster Launcher
💫Enhancements:
- Observability improvements (#14816, #14608)
- Worker nodes no longer killed on autoscaler failure (#14424)
- Better validation for min_workers and max_workers (#13779)
- Auto detect memory resource for AWS and K8s (#14567)
- On autoscaler failure, propagate error message to drivers (#14219)
- Avoid launching GPU nodes when the workload only has CPU tasks (#13776)
- Autoscaler/GCS compatibility (#13970, #14046, #14050)
- Testing (#14488, #14713)
- Migration of configs to multi-node-type format (#13814, #14239)
- Better config validation (#14244, #13779)
- Node-type max workers defaults infinity (#14201)
🔨 Fixes:
- AWS configuration (#14868, #13558, #14083, #13808)
- GCP configuration (#14364, #14417)
- Azure configuration (#14787, #14750, #14721)
- Kubernetes (#14712, #13920, #13720, #14773, #13756, #14567, #13705, #14024, #14499, #14593, #14655)
- Other (#14112, #14579, #14002, #13836, #14261, #14286, #14424, #13727, #13966, #14293, #14293, #14718, #14380, #14234, #14484)
Ray Client
💫Enhancements:
- Version checks for Python and client protocol (#13722, #13846, #13886, #13926, #14295)
- Validate server port number (#14815)
- Enable Ray client server by default (#13350, #13429, #13442)
- Disconnect ray upon client deactivation (#13919)
- Convert Ray objects to Ray client objects (#13639)
- Testing (#14617, #14813, #13016, #13961, #14163, #14248, #14630, #14756, #14786)
- Documentation (#14422, #14265)
🔨 Fixes:
- Hook runtime context (#13750)
- Fix mutual recursion (#14122)
- Set gRPC max message size (#14063)
- Monitor stream errors (#13386)
- Fix dependencies (#14654)
- Fix
ray.get
ctrl-c (#14425) - Report error deserialization errors (#13749)
- Named actor refcounting fix (#14753)
- RayTaskError serialization (#14698)
- Multithreading fixes (#14701)
Ray Core
🎉 New Features:
- We are now testing and publishing Ray's scalability limits with each release. Check out https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks.
- [alpha] Ray-native Python-based collective communication primitives for Ray clusters with distributed CPUs or GPUs.
🔨 Fixes:
- Ray is now using c++14.
- Fixed high CPU breaking raylets with heartbeat missing errors (#13963, #14301)
- Fixed high CPU issues from raylet during object transfer (#13724)
- Improvement in placement group APIs including better Java support (#13821, #13858, #13582, #15049, #13821)
Ray Data Processing
🎉 New Features:
- Object spilling is turned on by default. Check out the documentation.
- Dask-on-Ray and Spark-on-Ray are fully ready to use. Please try them out and give us feedback!
- Dask-on-Ray is now compatible with Dask 2021.4.0.
- Dask-on-Ray now works natively with
dask.persist()
.
🔨 Fixes:
- Various improvements in object spilling and memory management layer to support large scale data processing (#13649, #14149, #13853, #13729, #14222, #13781, #13737, #14288, #14578, #15027)
lru_evict
flag is now deprecated. Recommended solution now is to use object spilling.
🏗 Architecture refactoring:
- Various architectural improvements in object spilling and memory management. For more details, check out the whitepaper.
- Locality-aware scheduling is turned on by default.
- Moved from centralized GCS-based object directory protocol to decentralized owner-to-owner protocol, yielding better cluster scalability.
RLlib
🎉 New Features:
- R2D2 implementation for torch and tf. (#13933)
- PlacementGroup support (all RLlib algos now return PlacementGroupFactory from Trainer.default_resource_request). (#14289)
- Multi-GPU support for tf-DQN/PG/A2C. (#13393)
💫Enhancements:
- Documentation: Update documentation for Curiosity's support of continuous actions (#13784); CQL documentation (#14531)
- Attention-wrapper works with images and supports prev-n-actions/rewards options. (#14569)
rllib rollout
runs in parallel by default via Trainer’s evaluation worker set. (#14208)- Add env rendering (customizable) and video recording options (for non-local mode; >0 workers; +evaluation-workers) and episode media logging. (#14767, #14796)
- Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (#13522)
- Example Scripts: Add coin game env + matrix social dilemma env + tests and examples (shoutout to Maxime Riché!). (#14208); Attention net (#14864); Serve + RLlib. (#14416); Env seed (#14471); Trajectory view API (enhancements and tf2 support). (#13786); Tune trial + checkpoint selection. (#14209)
- DDPG: Add support for simplex action space. (#14011)
- Others:
on_learn_on_batch
callback allows custom metrics. (#13584); AddTorchPolicy.export_model()
. (#13989)
🔨 Fixes:
- Trajectory View API bugs (#13646, #14765, #14037, #14036, #14031, #13555)
- Test cases (#14620, #14450, #14384, #13835, #14357, #14243)
- Others (#13013, #14569, #13733, #13556, #13988, #14737, #14838, #15272, #13681, #13764, #13519, #14038, #14033, #14034, #14308, #14243)
🏗 Architecture refactoring:
- Remove all non-trajectory view API code. (#14860)
- Obsolete UsageTrackingDict in favor of SampleBatch. (#13065)
Tune
🎉 New Features:
- We added a new searcher
HEBOSearcher
(#14504, #14246, #13863, #14427) - Tune is now natively compatible with the Ray Client (#13778, #14115, #14280)
- Tune now uses Ray’s Placement Groups underneath the hood. This will enable much faster autoscaling and training (for distributed trials) (#13906, #15011, #14313)
💫Enhancements:
- Checkpointing improvements (#13376, #13767)
- Optuna Search Algorithm improvements (#14731, #14387)
- tune.with_parameters now works with Class API (#14532)
🔨Fixes:
- BOHB & Hyperband fixes (#14487, #14171)
- Nested metrics improvements (#14189, #14375, #14379)
- Fix non-deterministic category sampling (#13710)
- Type hints (#13684)
- Documentation (#14468, #13880, #13740)
- Various issues and bug fixes (#14176, #13939, #14392, #13812, #14781, #14150, #14850, #14118, #14388, #14152, #13825, #13936)
SGD
- Add fault tolerance during worker startup (#14724)
Serve
🎉 New Features:
- Added metadata to default logger in backend replicas (#14251)
- Added more metrics for ServeHandle stats (#13640)
- Deprecated system-level batching in favor of @serve.batch (#14610, #14648)
- Beta support for Serve with Ray client (#14163)
- Use placement groups to bypass autoscaler throttling (#13844)
- Deprecate client-based API in favor of process-wide singleton (#14696)
- Add initial support for FastAPI ingress (#14754)
🔨 Fixes:
- Fix ServeHandle serialization (#13695)
🏗 Architecture refactoring:
- Refactor BackendState to support backend versioning and add more unit testing (#13870, #14658, #14740, #14748)
- Optimize long polling to be per-key (#14335)
Dashboard
🎉 New Features:
- Dashboard now supports being served behind a reverse proxy. (#14012)
- Disk and network metrics are added to prometheus. (#14144)
💫Enhancements:
- Better CPU & memory information on K8s. (#14593, #14499)
- Progress towards a new scalable dashboard. (#13790, #11667, #13763,#14333)
Thanks
Many thanks to all those who contributed to this release:
@geraint0923, @iycheng, @yurirocha15, @brian-yu, @harryge00, @ijrsvt, @wumuzi520, @suquark, @simon-mo, @clarkzinzow, @RaphaelCS, @FarzanT, @ob, @ashione, @ffbin, @robertnishihara, @SongGuyang, @zhe-thoughts, @rkooo567, @ezra-h, @acxz, @clay4444, @QuantumMecha, @jirkafajfr, @wuisawesome, @Qstar, @guykhazma, @devin-petersohn, @jeroenboeye, @ConeyLiu, @dependabot[bot], @fyrestone, @micahtyong, @javi-redondo, @Manuscrit, @mxz96102, @EscapeReality846089495, @WangTaoTheTonic, @stanislav-chekmenev, @architkulkarni, @Yard1, @tchordia, @zhisbug, @Bam4d, @niole, @yiranwang52, @thomasjpfan, @DmitriGekhtman, @gabrieleoliaro, @jparkerholder, @kfstorm, @andrew-rosenfeld-ts, @erikerlandson, @Crissman, @raulchen, @sumanthratna, @Catch-Bull, @chaokunyang, @krfricke, @raoul-khour-ts, @sven1977, @kathryn-zhou, @AmeerHajAli, @jovany-wang, @amogkam, @antoine-galataud, @tgaddair, @randxie, @ChaceAshcraft, @ericl, @cassidylaidlaw, @TanjaBayer, @lixin-wei, @lena-kashtelyan, @cathrinS, @qicosmos, @richardliaw, @rmsander, @jCrompton, @mjschock, @pdames, @barakmich, @michaelzhiluo, @stephanie-wang, @edoakes