Skip to content

ray-1.1.0

Compare
Choose a tag to compare
@mfitton mfitton released this 24 Dec 10:07

Ray 1.1.0

Ray Core

🎉 New Features:

  • Progress towards supporting a Ray client
  • Descendent tasks are cancelled when the calling task is cancelled

🔨 Fixes:

  • Improved object broadcast robustness
  • Improved placement group support

🏗 Architecture refactoring:

  • Progress towards the new scheduler backend

RLlib

🎉 New Features:

  • SUMO simulator integration (rllib/examples/simulators/sumo/). Huge thanks to Lara Codeca! (#11710)
  • SlateQ Algorithm added for PyTorch. Huge thanks to Henry Chen! (#11450)
  • MAML extension for all Models, except recurrent ones. (#11337)
  • Curiosity Exploration Module for tf1.x/2.x/eager. (#11945)
  • Minimal JAXModelV2 example. (#12502)

🔨 Fixes:

  • Fix RNN learning for tf2.x/eager. (#11720)
  • LSTM prev-action/prev-reward settable separately and prev-actions are now one-hot’d. (#12397)
  • PyTorch LR schedule not working. (#12396)
  • Various PyTorch GPU bug fixes. (#11609)
  • SAC loss not using prio. replay weights in critic’s loss term. (#12394)
  • Fix epsilon-greedy Exploration for nested action spaces. (#11453)

🏗 Architecture refactoring:

  • Trajectory View API on by default (faster PG-type algos by ~20% (e.g. PPO on Atari)). (#11717, #11826, #11747, and #11827)

Tune

🎉 New Features:

  • Loggers can now be passed as objects to tune.run. The new ExperimentLogger abstraction was introduced for all loggers, making it much easier to configure logging behavior. (#11984, #11746, #11748, #11749)
  • The tune verbosity was refactored into four levels: 0: Silent, 1: Only experiment-level logs, 2: General trial-level logs, 3: Detailed trial-level logs (default) (#11767, #12132, #12571)
  • Docker and Kubernetes autoscaling environments are detected automatically, automatically utilizing the correct checkpoint/log syncing tools (#12108)
  • Trainables can now easily leverage Tensorflow DistributedStrategy! (#11876)

💫 Enhancements

  • Introduced a new serialization debugging utility (#12142)
  • Added a new lightweight Pytorch-lightning example (#11497, #11585)
  • The BOHB search algorithm can be seeded with a random state (#12160)
  • The default anonymous metrics can be used automatically if a mode is set in tune.run (#12159).
  • Added HDFS as Cloud Sync Client (#11524)
  • Added xgboost_ray integration (#12572)
  • Tune search spaces can now be passed to search algorithms on initialization, not only via tune.run (#11503)
  • Refactored and added examples (#11931)
  • Callable accepted for register_env (#12618)
  • Tune search algorithms can handle/ignore infinite and NaN numbers (#11835)
  • Improved scalability for experiment checkpointing (#12064)
  • Nevergrad now supports points_to_evaluate (#12207)
  • Placement group support for distributed training (#11934)

🔨 Fixes:

  • Fixed with_parameters behavior to avoid serializing large data in scope (#12522)
  • TBX logger supports None (#12262)
  • Better error when metric or mode unset in search algorithms (#11646)
  • Better warnings/exceptions for fail_fast='raise' (#11842)
  • Removed some bottlenecks in trialrunner (#12476)
  • Fix file descriptor leak by syncer and Tensorboard (#12590, #12425)
  • Fixed validation for search metrics (#11583)
  • Fixed hyperopt randint limits (#11946)

Serve

🎉 New Features:

🔨Fixes:

  • Set serve.start(http_host=None) to disable HTTP servers. If you are only using ServeHandle, this option lowers resource usage. (#11627)
  • Flask requests will no longer create reference cycles. This means peak memory usage should be lower for high traffic scenarios. (#12560)

🏗 Architecture refactoring:

Ray Cluster Launcher (Autoscaler)

🎉 New Features:

💫 Enhancements

  • Containers do not run with root user as the default (#11407)
  • SHM-Size is auto-populated when using the containers (#11953)

🔨 Fixes:

SGD

🎉 New Features:

  • Easily customize your torch.DistributedDataParallel configurations by passing in a ddp_args field into TrainingOperator.register (#11771).

🔨 Fixes:

  • TorchTrainer now properly scales up to more workers if more resources become available (#12562)

📖 Documentation:

  • The new callback API for using Ray SGD with Tune is now documented (#11479)
  • Pytorch Lightning + Ray SGD integration is now documented (#12440)

Dashboard

🔨 Fixes:

  • Fixed bug that prevented viewing the logs for cluster workers
  • Fixed bug that caused "Logical View" page to crash when opening a list of actors for a given class.

🏗 Architecture refactoring:

  • Dashboard runs on a new backend architecture that is more scalable and well-tested. The dashboard should work on ~100 node clusters now, and we're working on lifting scalability to constraints to support even larger clusters.

Thanks

Many thanks to all those who contributed to this release:
@bartbroere, @SongGuyang, @gramhagen, @richardliaw, @ConeyLiu, @weepingwillowben, @zhongchun, @ericl, @dHannasch, @timurlenk07, @kaushikb11, @krfricke, @desktable, @bcahlit, @rkooo567, @amogkam, @micahtyong, @edoakes, @stephanie-wang, @clay4444, @ffbin, @mfitton, @barakmich, @pcmoritz, @AmeerHajAli, @DmitriGekhtman, @iamhatesz, @raulchen, @ingambe, @allenyin55, @sven1977, @huyz-git, @yutaizhou, @suquark, @ashione, @simon-mo, @raoul-khour-ts, @Leemoonsoo, @maximsmol, @alanwguo, @kishansagathiya, @wuisawesome, @acxz, @gabrieleoliaro, @clarkzinzow, @jparkerholder, @kingsleykuan, @InnovativeInventor, @ijrsvt, @lasagnaphil, @lcodeca, @jiajiexiao, @heng2j, @wumuzi520, @mvindiola1, @aaronhmiller, @robertnishihara, @WangTaoTheTonic, @chaokunyang, @nikitavemuri, @kfstorm, @roireshef, @fyrestone, @viotemp1, @yncxcw, @karstenddwx, @hartikainen, @sumanthratna, @architkulkarni, @michaelzhiluo, @UWFrankGu, @oliverhu, @danuo, @lixin-wei