ray-1.1.0
Ray 1.1.0
Ray Core
🎉 New Features:
- Progress towards supporting a Ray client
- Descendent tasks are cancelled when the calling task is cancelled
🔨 Fixes:
- Improved object broadcast robustness
- Improved placement group support
🏗 Architecture refactoring:
- Progress towards the new scheduler backend
RLlib
🎉 New Features:
- SUMO simulator integration (rllib/examples/simulators/sumo/). Huge thanks to Lara Codeca! (#11710)
- SlateQ Algorithm added for PyTorch. Huge thanks to Henry Chen! (#11450)
- MAML extension for all Models, except recurrent ones. (#11337)
- Curiosity Exploration Module for tf1.x/2.x/eager. (#11945)
- Minimal JAXModelV2 example. (#12502)
🔨 Fixes:
- Fix RNN learning for tf2.x/eager. (#11720)
- LSTM prev-action/prev-reward settable separately and prev-actions are now one-hot’d. (#12397)
- PyTorch LR schedule not working. (#12396)
- Various PyTorch GPU bug fixes. (#11609)
- SAC loss not using prio. replay weights in critic’s loss term. (#12394)
- Fix epsilon-greedy Exploration for nested action spaces. (#11453)
🏗 Architecture refactoring:
- Trajectory View API on by default (faster PG-type algos by ~20% (e.g. PPO on Atari)). (#11717, #11826, #11747, and #11827)
Tune
🎉 New Features:
- Loggers can now be passed as objects to tune.run. The new ExperimentLogger abstraction was introduced for all loggers, making it much easier to configure logging behavior. (#11984, #11746, #11748, #11749)
- The tune verbosity was refactored into four levels: 0: Silent, 1: Only experiment-level logs, 2: General trial-level logs, 3: Detailed trial-level logs (default) (#11767, #12132, #12571)
- Docker and Kubernetes autoscaling environments are detected automatically, automatically utilizing the correct checkpoint/log syncing tools (#12108)
- Trainables can now easily leverage Tensorflow DistributedStrategy! (#11876)
💫 Enhancements
- Introduced a new serialization debugging utility (#12142)
- Added a new lightweight Pytorch-lightning example (#11497, #11585)
- The BOHB search algorithm can be seeded with a random state (#12160)
- The default anonymous metrics can be used automatically if a
mode
is set in tune.run (#12159). - Added HDFS as Cloud Sync Client (#11524)
- Added xgboost_ray integration (#12572)
- Tune search spaces can now be passed to search algorithms on initialization, not only via tune.run (#11503)
- Refactored and added examples (#11931)
- Callable accepted for register_env (#12618)
- Tune search algorithms can handle/ignore infinite and NaN numbers (#11835)
- Improved scalability for experiment checkpointing (#12064)
- Nevergrad now supports points_to_evaluate (#12207)
- Placement group support for distributed training (#11934)
🔨 Fixes:
- Fixed with_parameters behavior to avoid serializing large data in scope (#12522)
- TBX logger supports None (#12262)
- Better error when
metric
ormode
unset in search algorithms (#11646) - Better warnings/exceptions for fail_fast='raise' (#11842)
- Removed some bottlenecks in trialrunner (#12476)
- Fix file descriptor leak by syncer and Tensorboard (#12590, #12425)
- Fixed validation for search metrics (#11583)
- Fixed hyperopt randint limits (#11946)
Serve
🎉 New Features:
- You can start backends in different conda environments! See more in the dependency management doc. (#11743)
- You can add a optional
reconfigure
method to your Servable to allow reconfiguring backend replicas at runtime. (#11709)
🔨Fixes:
- Set serve.start(http_host=None) to disable HTTP servers. If you are only using ServeHandle, this option lowers resource usage. (#11627)
- Flask requests will no longer create reference cycles. This means peak memory usage should be lower for high traffic scenarios. (#12560)
🏗 Architecture refactoring:
- Progress towards a goal state driven Serve controller. (#12369,#11792,#12211,#12275,#11533,#11822,#11579,#12281)
- Progress towards faster and more efficient ServeHandles. (#11905, #12019, #12093)
Ray Cluster Launcher (Autoscaler)
🎉 New Features:
- A new Kubernetes operator: https://docs.ray.io/en/master/cluster/k8s-operator.html
💫 Enhancements
- Containers do not run with root user as the default (#11407)
- SHM-Size is auto-populated when using the containers (#11953)
🔨 Fixes:
- Many autoscaler bug fixes (#11677, #12222, #11458, #11896, #12123, #11820, #12513, #11714, #12512, #11758, #11615, #12106, #11961, #11674, #12028, #12020, #12316, #11802, #12131, #11543, #11517, #11777, #11810, #11751, #12465, #11422)
SGD
🎉 New Features:
- Easily customize your torch.DistributedDataParallel configurations by passing in a
ddp_args
field intoTrainingOperator.register
(#11771).
🔨 Fixes:
TorchTrainer
now properly scales up to more workers if more resources become available (#12562)
📖 Documentation:
- The new callback API for using Ray SGD with Tune is now documented (#11479)
- Pytorch Lightning + Ray SGD integration is now documented (#12440)
Dashboard
🔨 Fixes:
- Fixed bug that prevented viewing the logs for cluster workers
- Fixed bug that caused "Logical View" page to crash when opening a list of actors for a given class.
🏗 Architecture refactoring:
- Dashboard runs on a new backend architecture that is more scalable and well-tested. The dashboard should work on ~100 node clusters now, and we're working on lifting scalability to constraints to support even larger clusters.
Thanks
Many thanks to all those who contributed to this release:
@bartbroere, @SongGuyang, @gramhagen, @richardliaw, @ConeyLiu, @weepingwillowben, @zhongchun, @ericl, @dHannasch, @timurlenk07, @kaushikb11, @krfricke, @desktable, @bcahlit, @rkooo567, @amogkam, @micahtyong, @edoakes, @stephanie-wang, @clay4444, @ffbin, @mfitton, @barakmich, @pcmoritz, @AmeerHajAli, @DmitriGekhtman, @iamhatesz, @raulchen, @ingambe, @allenyin55, @sven1977, @huyz-git, @yutaizhou, @suquark, @ashione, @simon-mo, @raoul-khour-ts, @Leemoonsoo, @maximsmol, @alanwguo, @kishansagathiya, @wuisawesome, @acxz, @gabrieleoliaro, @clarkzinzow, @jparkerholder, @kingsleykuan, @InnovativeInventor, @ijrsvt, @lasagnaphil, @lcodeca, @jiajiexiao, @heng2j, @wumuzi520, @mvindiola1, @aaronhmiller, @robertnishihara, @WangTaoTheTonic, @chaokunyang, @nikitavemuri, @kfstorm, @roireshef, @fyrestone, @viotemp1, @yncxcw, @karstenddwx, @hartikainen, @sumanthratna, @architkulkarni, @michaelzhiluo, @UWFrankGu, @oliverhu, @danuo, @lixin-wei