25 Mar 21:28

simon-mo

7287fa0

Ray 0.8.3

Highlights

Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
- Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
- It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
Distributed reference counting is turned on by default. (#7628, #7337)
- This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
- When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
A new command ray memory is added to help debug memory usage: (#7589)
- It shows all object IDs that are in scope, their reference types, sizes and creation site.
  - Read more in the docs: https://ray.readthedocs.io/en/latest/memory-management.html.

> ray memory
-----------------------------------------------------------------------------------------------------
 Object ID                                Reference Type       Object Size   Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000  PINNED_IN_MEMORY            8231   (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000  USED_BY_PENDING_TASK           ?   (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000  USED_BY_PENDING_TASK        8231   (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000  LOCAL_REFERENCE                ?   (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------

API change

Change actor.__ray_kill__() to ray.kill(actor). (#7360)
Deprecate use_pickle flag for serialization. (#7474)
Remove experimental.NoReturn. (#7475)
Remove experimental.signal API. (#7477)

Core

Add Apache 2 license header to C++ files. (#7520)
Reduce per worker memory usage to 50MB. (#7573)
Option to fallback to LRU on OutOfMemory. (#7410)
Reference counting for actor handles. (#7434)
Reference counting for returning object IDs created by a different process. (#7221)
Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
Remove static concurrency limit from gRPC server. (#7544)
Remove get_global_worker(), RuntimeContext. (#7638)
Fix known issues from 0.8.2 release:
- Fix passing duplicate by-reference arguments. (#7306)
- Fix Raise gRPC message size limit to 100MB. (#7269)

RLlib

New features:
- Exploration API improvements. (#7373, #7314, #7380)
- SAC: add discrete action support. (#7320, #7272)
- Add high-performance external application connector. (#7641)
Bug fix highlights:
- PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
- Rename sample_batch_size => rollout_fragment_length. (#7503)
- Fix bugs and speed up SegmentTree.

Tune

Integrate Dragonfly optimizer. (#5955)
Fix HyperBand errors. (#7563)
Access Trial Name, Trial ID inside trainable. (#7378)
Add a new repeater class for high variance trials. (#7366)
Prevent deletion of checkpoint from user-initiated restoration. (#7501)

Libraries

[Parallel Iterators] Allow for operator chaining after repartition. (#7268)
[Parallel Iterators] Repartition functionality. (#7163)
[Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
[RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
[RaySGD] Custom training API. (#7211)
[RaySGD] Breaking User API changes: (#7384)
- data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
- TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
- data_loader_config and batch_size are no longer parameters for TorchTrainer.
- TorchTrainer parallelism is now set by num_workers.
- All TorchTrainer args now must be named parameters.

Java

New Java actor API (#7414)
- @RayRemote annotation is removed.
- Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
Allow passing internal config from raylet to Java worker. (#7532)
Enable direct call by default. (#7408)
Pass large object by reference. (#7595)

Others

Progress towards Ray Streaming, including a Python API. (#7070, #6755, #7152, #7582)
Progress towards GCS Service for GCS fault tolerance. (#7292, #7592, #7601, #7166)
Progress towards cross language call between Java and Python. (#7614, #7634)
Progress towards Windows compatibility. (#7529, #7509, #7658, #7315)
Improvement in K8s Operator. (#7521, #7621, #7498, #7459, #7622)
New documentation for Ray Dashboard. (#7304)

Known issues

Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

Thanks

We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @clarkzinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @tisonkun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz

Assets 2

24 Feb 19:28

simon-mo

ray-0.8.2

622eee4

Ray 0.8.2

Highlights

Pyarrow is no longer vendored. Ray directly uses the C++ Arrow API. You can use any version of pyarrow with ray. (#7233)
The dashboard is turned on by default. It shows node and process information, actor information, and Ray Tune trials information. You can also use ray.show_in_webui to display custom messages for actors. Please try it out and send us feedback! (#6705, #6820, #6822, #6911, #6932, #6955, #7028, #7034)
We have made progress on distributed reference counting (behind a feature flag). You can try it out with ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 1})). It is designed to help manage memory using precise distributed garbage collection. (#6945, #6946, #7029, #7075, #7218, #7220, #7222, #7235, #7249)

Breaking changes

Many experimental Ray libraries are moved to the util namespace. (#7100)
- ray.experimental.multiprocessing => ray.util.multiprocessing
- ray.experimental.joblib => ray.util.joblib
- ray.experimental.iter => ray.util.iter
- ray.experimental.serve => ray.serve
- ray.experimental.sgd => ray.util.sgd
Tasks and actors are cleaned up if their owner process dies. (#6818)
The OMP_NUM_THREADS environment variable defaults to 1 if unset. This improves training performance and reduces resource contention. (#6998)
We now vendor psutil and setproctitle to support turning the dashboard on by default. Running import psutil after import ray will use the version of psutil that ships with Ray. (#7031)

Core

The Python raylet client is removed. All raylet communication now goes through the core worker. (#6018)
Calling delete() will not delete objects in the in-memory store. (#7117)
Removed vanilla pickle serialization for task arguments. (#6948)
Fix bug passing empty bytes into Python tasks. (#7045)
Progress toward next generation ray scheduler. (#6913)
Progress toward service based global control store (GCS). (#6686, #7041)

RLlib

Improved PyTorch support, including a PyTorch version of PPO. (#6826, #6770)
Added distributed SGD for PPO. (#6918, #7084)
Added an exploration API for controlling epsilon greedy and stochastic exploration. (#6974, #7155)
Fixed schedule values going negative past the end of the schedule. (#6971, #6973)
Added support for histogram outputs in TensorBoard. (#6942)
Added support for parallel and customizable evaluation step. (#6981)

Tune

Improved Ax Example. (#7012)
Process saves asynchronously. (#6912)
Default to tensorboardx and include it in requirements. (#6836)
Added experiment stopping api. (#6886)
Expose progress reporter to users. (#6915)
Fix directory naming regression. (#6839)
Handles nan case for asynchyperband. (#6916)
Prevent memory checkpoints from breaking trial fault tolerance. (#6691)
Remove keras dependency. (#6827)
Remove unused tf loggers. (#7090)
Set correct path when deleting checkpoint folder. (#6758)
Support callable objects in variant generation. (#6849)

Autoscaler

Ray nodes now respect docker limits. (#7039)
Add --all-nodes option to rsync-up. (#7065)
Add port-forwarding support for attach. (#7145)
For AWS, default to latest deep learning AMI. (#6922)
Added 'ray dashboard' command to proxy ray dashboard in remote machine. (#6959)

Utility libraries

Support of scikit-learn with Ray joblib backend. (#6925)
Parallel iterator support local shuffle. (#6921)
[Serve] support no http headless services. (#7010)
[Serve] refactor router to use Ray asyncio support. (#6873)
[Serve] support composing arbitrary dags. (#7015)
[RaySGD] support fp16 via PyTorch apex. (#7061)
[RaySGD] refactor PyTorch sgd documentation. (#6910)
Improvement in Ray Streaming. (#7043, #6666, #7071)

Other improvements

Progress toward Windows compatibility. (#6882, #6823)
Ray Kubernetes operator improvements. (#6852, #6851, #7091)
Java support for concurrent actor calls API. (#7022)
Java support for direct call for normal tasks. (#7193)
Java support for cross language Python invocation. (#6709)
Java support for cross language serialization for actor handles. (#7134)

Known issue

Passing the same ObjectIDs multiple time as arguments currently doesn't work. (#7296)
Tasks can exceed gRPC max message size. (#7263)

Thanks

We thank the following contributors for their work on this release:
@mitchellstern, @hugwi, @deanwampler, @alindkhare, @ericl, @ashione, @fyrestone, @robertnishihara, @pcmoritz, @richardliaw, @yutaizhou, @istoica, @edoakes, @ls-daniel, @BalaBalaYi, @raulchen, @justinkterry, @roireshef, @elpollouk, @kfstorm, @Bassstring, @hhbyyh, @Qstar, @mehrdadn, @chaokunyang, @flying-mojo, @ujvl, @AnanthHari, @rkooo567, @simon-mo, @jovany-wang, @ijrsvt, @ffbin, @AmeerHajAli, @gaocegege, @suquark, @MissiontoMars, @zzyunzhi, @sven1977, @stephanie-wang, @amogkam, @wuisawesome, @aannadi, @maximsmol

Assets 2

27 Jan 22:23

edoakes

ray-0.8.1

38ec2e7

ray-0.8.1

Ray 0.8.1 Release Notes

Highlights

ObjectIDs corresponding to ray.put() objects and task returns are now reference counted locally in Python and when passed into a remote task as an argument. ObjectIDs that have a nonzero reference count will not be evicted from the object store. Note that references for ObjectIDs passed into remote tasks inside of other objects (e.g., f.remote((ObjectID,)) or f.remote([ObjectID])) are not currently accounted for. (#6554)
asyncio actor support: actors can now define async def method and Ray will run multiple method invocations in the same event loop. The maximum concurrency level can be adjusted with ActorClass.options(max_concurrency=2000).remote().
asyncio ObjectID support: Ray ObjectIDs can now be directly awaited using the Python API. await my_object_id is similar to ray.get(my_object_id), but allows context switching to make the operation non-blocking. You can also convert an ObjectID to a asyncio.Future using ObjectID.as_future().
Added experimental parallel iterators API (#6644, #6726): ParallelIterators can be used to more convienently load and process data into Ray actors. See the documentation for details.
Added multiprocessing.Pool API (#6194): Ray now supports the multiprocessing.Pool API out of the box, so you can scale existing programs up from a single node to a cluster by only changing the import statment. See the documentation for details.

Core

Deprecated Python 2 (#6581, #6601, #6624, #6665)
Fixed bug when failing to import remote functions or actors with args and kwargs (#6577)
Many improvements to the dashboard (#6493, #6516, #6521, #6574, #6590, #6652, #6671, #6683, #6810)
Progress towards Windows compatibility (#6446, #6548, #6653, #6706)
Redis now binds to localhost and has a password set by default (#6481)
Added actor.__ray_kill__() to terminate actors immediately (#6523)
Added 'ray stat' command for debugging (#6622)
Added documentation for fault tolerance behavior (#6698)
Treat static methods as class methods instead of instance methods in actors (#6756)

RLlib

DQN distributional model: Replace all legacy tf.contrib imports with tf.keras.layers.xyz or tf.initializers.xyz (#6772)
SAC site changes (#6759)
PG unify/cleanup tf vs torch and PG functionality test cases (tf + torch) (#6650)
SAC for Mujoco Environments (#6642)
Tuple action dist tensors not reduced properly in eager mode (#6615)
Changed foreach_policy to foreach_trainable_policy (#6564)
Wrapper for the dm_env interface (#6468)

Tune

Get checkpoints paths for a trial after tuning (#6643)
Async restores and S3/GCP-capable trial FT (#6376)
Usability errors PBT (#5972)
Demo exporting trained models in pbt examples (#6533)
Avoid duplication in TrialRunner execution (#6598)
Update params for optimizer in reset_config (#6522)
Support Type Hinting for py3 (#6571)

Other Libraries

[serve] Pluggable Queueing Policy (#6492)
[serve] Added BackendConfig (#6541)
[sgd] Fault tolerance support for pytorch + revamp documentation (#6465)

Thanks

We thank the following contributors for their work on this release:

@chaokunyang, @Qstar, @simon-mo, @wlx65003, @stephanie-wang, @alindkhare, @ashione, @harrisonfeng, @JingGe, @pcmoritz, @zhijunfu, @BalaBalaYi, @kfstorm, @richardliaw, @mitchellstern, @michaelzhiluo, @ziyadedher, @istoica, @EyalSel, @ffbin, @raulchen, @edoakes, @chenk008, @frthjf, @mslapek, @gehring, @hhbyyh, @zzyunzhi, @zhu-eric, @MissiontoMars, @sven1977, @walterddr, @micafan, @inventormc, @robertnishihara, @ericl, @ZhongxiaYan, @mehrdadn, @jovany-wang, @ujvl, @bharatpn

Assets 2

18 Dec 00:01

ericl

ray-0.8.0

0afe14f

Ray 0.8.0 Release Notes

This is the first release with gRPC direct calls enabled by default for both tasks and actors, which substantially improves task submission performance.

Highlights

Enable gRPC direct calls by default (#6367). In this mode, actor tasks are sent directly from actor to actor over gRPC; the Raylet only coordinates actor creation. Similarly, with tasks, tasks are submitted directly from worker to worker over gRPC; the Raylet only coordinates the scheduling decisions. In addition, small objects (<100KB in size) are no longer placed in the object store. They are inlined into task submissions and returns when possible.

Note: in some cases, reconstruction of large evicted objects is not possible with direct calls. To revert to the 0.7.7 behaviour, you can set the environment variable RAY_FORCE_DIRECT=0.

Core

[Dashboard] Add remaining features from old dashboard (#6489)
Ray Kubernetes Operator Part 1: readme, structure, config and CRD realted file (#6332)
Make sure numpy >= 1.16.0 is installed for fast pickling support (#6486)
Avoid workers starting with the same random seed (#6471)
Properly handle a forwarded task that gets forwarded back (#6271)

RLlib

(Bug Fix): Remove the extra 0.5 in the Diagonal Gaussian entropy (#6475)
AlphaZero and Ranked reward implementation (#6385)

Tune

Add example and tutorial for DCGAN (#6400)
Report trials by state fairly (#6395)
Fixed bug in PBT where initial trial result is empty. (#6351)

Other Libraries

[sgd] Add support for multi-model multi-optimizer training (#6317)
[serve] Added deadline awareness (#6442)
[projects] Return parameters for a command (#6409)
[streaming] Streaming data transfer and python integration (#6185)

Thanks

We thank the following contributors for their work on this release:

@zplizzi, @istoica, @ericl, @mehrdadn, @walterddr, @ujvl, @alindkhare, @timgates42, @chaokunyang, @eugenevinitsky, @kfstorm, @Maltimore, @visatish, @simon-mo, @AmeerHajAli, @wumuzi520, @robertnishihara, @micafan, @pcmoritz, @zhijunfu, @edoakes, @sytelus, @ffbin, @richardliaw, @Qstar, @stephanie-wang, @Coac, @mitchellstern, @MissiontoMars, @deanwampler, @hhbyyh, @raulchen

Assets 2

16 Dec 00:52

edoakes

ray-0.7.7

93157e0

ray-0.7.7

Ray 0.7.7 Release Notes

Highlights

Remote functions and actors now support kwargs and positionals (#5606).
ray.get now supports a timeout argument (#6107). If the object isn't available before the timeout passes, a RayTimeoutError is raised.
Ray now supports detached actors (#6036), which persist beyond the lifetime of the script that creates them and can be referred to by a user-defined name.
Added documentation for how to deploy Ray on YARN clusters using Skein (#6119, #6173).
The Ray scheduler now attempts to schedule tasks fairly to avoid starvation (#5851).

Core

Progress towards a new backend architecture where tasks and actor tasks are submitted directly between workers. #5783, #5991, #6040, #6054, #6075, #6088, #6122, #6147, #6171, #6177, #6118, #6188, #6259, #6277
Progress towards Windows compatibility. #6071, #6204, #6205, #6282
Now using cloudpickle_fast for serialization by default, which supports more types of Python objects without sacrificing performance. #5658, #5805, #5960, #5978
Various bugfixes. #5946, #6175, #6176, #6231, #6253, #6257, #6276,

RLlib

Now using pytorch's function to see if gpu is available. #5890
Fixed APEX priorities returning zero all the time. #5980
Fixed leak of TensorFlow assign operations in DQN/DDPG. #5979
Fixed choosing the wrong neural network model for Atari in 0.7.5. #6087
Added large scale regression test for RLlib. #6093
Fixed and added test for LR annealing config. #6101
Reduced log verbosity. #6154
Added a microbatch optimizer with an A2C example. #6161

Tune

Search algorithms now use early stopped trials for optimization. #5651
Metrics are now outputted via a tabular format. Errors are outputted on a separate table. #5822
In the distributed setting, checkpoints are now deleted automatically post-sync using an rsync flag. Checkpoints on the driver are garbage collected according to the policy defined by the user. #5877
A much faster ExperimentAnalysis tool. #5962
Trial executor callbacks now take in a “Runner” parameter. #5868
Fixed queue_trials so to enable cluster autoscaling with a CPU-Only Head Node. #5900
Added a TensorBoardX logger. #6133

Other Libraries

Serving: Progress towards a new Ray serving library. #5854, #5886, #5894, #5929, #5937, #5961, #6051

Thanks

We thank the following contributors for their amazing contributions:

@zhuohan123, @jovany-wang, @micafan, @richardliaw, @waldroje, @mitchellstern, @visatish, @mehrdadn, @istoica, @ericl, @adizim, @simon-mo, @lsklyut, @zhu-eric, @pcmoritz, @hhbyyh, @suquark, @sotte, @hershg, @pschafhalter, @stackedsax, @edoakes, @mawright, @stephanie-wang, @ujvl, @ashione, @couturierc, @AdamGleave, @robertnishihara, @DaveyBiggers, @daiyaanarfeen, @danyangz, @AmeerHajAli, @mimoralea

Assets 2

24 Oct 18:00

edoakes

ray-0.7.6

01cc7ed

ray-0.7.6

Ray 0.7.6 Release Notes

Highlights

The Ray autoscaler now supports Kubernetes as a backend (#5492). This makes it possible to start a Ray cluster on top of your existing Kubernetes cluster with a simple shell command.
- Please see the Kubernetes section of the autoscaler documentation to get started.
- This is a new feature and may be rough around the edges. If you run into problems or have suggestions for how to improve Ray on Kubernetes, please file an issue.
The Ray cluster dashboard has been revamped (#5730, #5857) to improve the UI and include logs and error messages. More improvements will be coming in the near future.
- You can try out the dashboard by starting Ray with ray.init(include_webui=True) or ray start --include-webui.
- Please let us know if you have suggestions for what would be most useful to you in the new dashboard.

Core

Progress towards refactoring the Python worker on top of the core worker. #5750, #5771, #5752
Fix an issue in local mode where multiple actors didn't work properly. #5863
Fix class attributes and methods for actor classes. #5802
Improvements in error messages and handling. #5782, #5746, #5799
Serialization improvements. #5841, #5725
Various documentation improvements. #5801, #5792, #5414, #5747, #5780, #5582

RLlib

Added a link to BAIR blog posts in the documentation. #5762
Tracing for eager tensorflow policies with tf.function. #5705

Tune

Improved MedianStoppingRule. #5402
Add PBT + Memnn example. #5723
Add support for function-based stopping condition. #5754
Save/Restore for Suggestion Algorithms. #5719
TensorBoard HParams for TF2.0. #5678

Other Libraries

Serving: Progress towards a new Ray serving library. #5849, #5850, #5852

Thanks

We thank the following contributors for their amazing contributions:

@hershg, @JasonWayne, @kfstorm, @richardliaw, @batzner, @vakker, @robertnishihara, @stephanie-wang, @gehring, @edoakes, @zhijunfu, @pcmoritz, @mitchellstern, @ujvl, @simon-mo, @ecederstrand, @mawright, @ericl, @anthonyhsyu, @suquark, @waldroje

Assets 2

25 Sep 00:07

ericl

ray-0.7.5

6da7eff

ray-0.7.5

Ray 0.7.5 Release Notes

Ray API

Objects created with ray.put() are now reference counted. #5590
Add internal pin_object_data() API. #5637
Initial support for pickle5. #5611
Warm up Ray on ray.init(). #5685
redis_address passed to ray.init is now just address. #5602

Core

Progress towards a common C++ core worker. #5516, #5272, #5566, #5664
Fix log monitor stall with many log files. #5569
Print warnings when tasks are unschedulable. #5555
Take into account resource queue lengths when autoscaling #5702, #5684

Tune

TF2.0 TensorBoard support. #5547, #5631
tune.function() is now deprecated. #5601

RLlib

Enhancements for TF eager support. #5625, #5683, #5705
Fix DDPG regression. #5626

Other Libraries

Complete rewrite of experimental serving library. #5562
Progress toward Ray projects APIs. #5525, #5632, #5706
Add TF SGD implementation for training. #5440
Many documentation improvements and bugfixes.

Assets 2

05 Sep 23:11

pcmoritz

ray-0.7.4

dcff263

ray-0.7.4

Ray 0.7.4 Release Notes

Highlights

There were many documentation improvements (#5391, #5389, #5175). As we continue to improve the documentation we value your feedback through the “Doc suggestion?” link at the top of the documentation. Notable improvements:
- We’ve added guides for best practices using TensorFlow and PyTorch.
- We’ve revamped the Walkthrough page for Ray users, providing a better experience for beginners.
- We’ve revamped guides for using Actors and inspecting internal state.
Ray supports memory limits now to ensure memory-intensive applications run predictably and reliably. You
can activate them through the ray.remote decorator:
```
@ray.remote(
    memory=2000 * 1024 * 1024,
    object_store_memory=200 * 1024 * 1024)
class SomeActor(object):
    def __init__(self, a, b):
        pass
```
You can set limits for the heap and the object store, see the documentation.
There is now preliminary support for projects, see the the project documentation. Projects allow you to
package your code and easily share it with others, ensuring a reproducible cluster setup. To get started, you
can run
```
# Create a new project.
ray project create <project-name>
# Launch a session for the project in the current directory.
ray session start
# Open a console for the given session.
ray session attach
# Stop the given session and all of its worker nodes.
ray session stop
```
Check out the examples. This is an actively developed new feature so we appreciate your feedback!

Breaking change: The redis_address parameter was renamed to address (#5412, #5602) and the former will be removed in the future.

Core

Move Java bindings on top of the core worker #5370
Improve log file discoverability #5580
Clean up and improve error messages #5368, #5351

RLlib

Support custom action space distributions #5164
Add TensorFlow eager support #5436
Add autoregressive KL #5469
Autoregressive Action Distributions #5304
Implement MADDPG agent #5348
Port Soft Actor-Critic on Model v2 API #5328
More examples: Add CARLA community example #5333 and rock paper scissors multi-agent example #5336
Moved RLlib to top level directory #5324

Tune

Experimental Implementation of the BOHB algorithm #5382
Breaking change: Nested dictionary results are now flattened for CSV writing: {“a”: {“b”: 1}} => {“a/b”: 1} #5346
Add Logger for MLFlow #5438
TensorBoard support for TensorFlow 2.0 #5547
Added examples for XGBoost and LightGBM #5500
HyperOptSearch now has warmstarting #5372

Other Libraries

SGD: Tune interface for Pytorch MultiNode SGD #5350
Serving: The old version of ray.serve was deprecated #5541
Autoscaler: Fix ssh control path limit #5476
Dev experience: Ray CI tracker online at https://ray-travis-tracker.herokuapp.com/

Various fixes: Fix log monitor issues #4382 #5221 #5569, the top-level ray directory was cleaned up #5404

Thanks

We thank the following contributors for their amazing contributions:

@jon-chuang, @lufol, @adamochayon, @idthanm, @RehanSD, @ericl, @michaelzhiluo, @nflu, @pengzhenghao, @hartikainen, @wsjeon, @raulchen, @TomVeniat, @layssi, @jovany-wang, @llan-ml, @ConeyLiu, @mitchellstern, @gregSchwartz18, @jiangzihao2009, @jichan3751, @mhgump, @zhijunfu, @micafan, @simon-mo, @richardliaw, @stephanie-wang, @edoakes, @akharitonov, @mawright, @robertnishihara, @lisadunlap, @flying-mojo, @pcmoritz, @jredondopizarro, @gehring, @holli, @kfstorm

Assets 2

04 Aug 02:37

simon-mo

ray-0.7.3

e4854b1

ray-0.7.3

Ray 0.7.3 Release Note

Highlights

RLlib ModelV2API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.

ray.experimental.sgd.pytorch.PyTorchTrainer is ready for early adopters. Checkout the documentation here. We welcome your feedback!

model_creator = lambda config: YourPyTorchModel()
data_creator = lambda config: YourTrainingSet(), YourValidationSet()

trainer = PyTorchTrainer(
    model_creator,
    data_creator,
    optimizer_creator=utils.sgd_mse_optimizer,
    config={"lr": 1e-4},
    num_replicas=2,
    resources_per_replica=Resources(num_gpus=1),
    batch_size=16,
    backend="auto")

for i in range(NUM_EPOCHS):
    trainer.train()

You can query all the clients that have performed ray.init to connect to the current cluster with ray.jobs(). #5076

>>> ray.jobs()
[{'JobID': '02000000',
  'NodeManagerAddress': '10.99.88.77',
  'DriverPid': 74949,
  'StartTime': 1564168784,
  'StopTime': 1564168798},
 {'JobID': '01000000',
  'NodeManagerAddress': '10.99.88.77',
  'DriverPid': 74871,
  'StartTime': 1564168742}]

Core

Improvement on memory storage handling. #5143, #5216, #4893
Improved workflow:
- Debugging tool local_mode now behaves more consistently. #5060
- Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. #5237
Ray core:
- Experimental direct actor call. #5140, #5184
- Improvement in core worker, the shared module between Python and Java. #5079, #5034, #5062
- GCS (global control store) was refactored. #5058, #5050

RLlib

Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
learner_queue_timeout can be configured for async sample optimizer. #5270
reproducible_seed can be used for reproducible experiments. #5197
Added entropy coefficient decay to IMPALA, APPO and PPO #5043

Tune:

Breaking: ExperimentAnalysis is now returned by default from tune.run. To obtain a list of trials, use analysis.trials. #5115
Breaking: Syncing behavior between head and workers can now be customized (sync_to_driver). Syncing behavior (upload_dir) between cluster and cloud is now separately customizable (sync_to_cloud). This changes the structure of the uploaded directory - now local_dir is synced with upload_dir. #4450
Introduce Analysis and ExperimentAnalysis objects. Analysis object will now return all trials in a folder; ExperimentAnalysis is a subclass that returns all trials of an experiment. #5115
Add missing argument tune.run(keep_checkpoints_num=...). Enables only keeping the last N checkpoints. #5117
Trials on failed nodes will be prioritized in processing. #5053
Trial Checkpointing is now more flexible. #4728
Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with tune.run(log_sys_usage=True). #4924
Experiment checkpointing frequency is now less frequent and can be controlled with tune.run(global_checkpoint_period=...). #4859

Autoscaler

Add a request_cores function for manual autoscaling. You can now manually request resources for the autoscaler. #4754
Local cluster:
- More readable example yaml with comments. #5290
- Multiple cluster name is supported. #4864
Improved logging with AWS NodeProvider. create_instance call will be logged. #4998

Others Libraries:

SGD:
- Example for Training. #5292
- Deprecate old distributed SGD implementation. #5160
Kuberentes: Ray namespace added for k8s. #4111
Dev experience: Add linting pre-push hook. #5154

Thanks:

We thank the following contributors for their amazing contributions:

@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @Dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl

Assets 2

03 Jul 05:57

simon-mo

ray-0.7.2

6e6cbb6

ray-0.7.2

Core

Improvements
- Continue moving the worker code to C++. #5031, #4966, #4922, #4899, #5032, #4996, #4875
- Add a hash table data structure to the Redis modules. #4911
- Use gRPC for communication between node managers. #4968, #5023, #5024
Python
- @ray.remote now inherits the function docstring. #4985
- Remove typing module from setup.py install_requirements. #4971
Java
- Allow users to set JVM options at actor creation time. #4970
Internal
- Refactor IDs: DriverID -> JobID, change all ID functions to camel case. #4964, #4896
- Improve organization of directory structure. #4898
Peformance
- Get task object dependencies in parallel from object store. #4775
- Flush lineage cache on task submission instead of execution. #4942
- Remove debug check for uncommitted lineage. #5038

Tune

Add directional metrics for components. #4120, #4915
Disallow setting resources_per_trial when it is already configured. #4880
Make PBT Quantile fraction configurable. #4912

RLlib

Add QMIX mixer parameters to optimizer param list. #5014
Allow Torch policies access to full action input dict in extra_action_out_fn. #4894
Allow access to batches prior to postprocessing. #4871
Throw error if sample_async is used with pytorch for A3C. #5000
Patterns & User Experience
- Rename PolicyEvaluator => RolloutWorker. #4820
- Port remainder of algorithms to build_trainer() pattern. #4920
- Port DQN to build_tf_policy() pattern. #4823
Documentation
- Add docs on how to use TF eager execution. #4927
- Add preprocessing example to offline documentation. #4950

Other Libraries

Add support for distributed training with PyTorch. #4797, #4933
Autoscaler will kill workers on exception. #4997
Fix handling of non-integral timeout values in signal.receive. #5002

Thanks

We thank the following contributors for their amazing contributions: @jiangzihao2009, @raulchen, @ericl, @hershg, @kfstorm, @kiddyboots216, @jovany-wang, @pschafhalter, @richardliaw, @robertnishihara, @stephanie-wang, @simon-mo, @zhijunfu, @ls-daniel, @ajgokhale, @rueberger, @suquark, @guoyuhong, @jovany-wang, @pcmoritz, @hartikainen, @timonbimon, @TianhongDai

Assets 2

Releases: ray-project/ray

Ray 0.8.3

Highlights

API change

Core

RLlib

Tune

Libraries

Java

Others

Known issues

Thanks

Ray 0.8.2

Highlights

Breaking changes

Core

RLlib

Tune

Autoscaler

Utility libraries

Other improvements

Known issue

Thanks

ray-0.8.1

Ray 0.8.1 Release Notes

Highlights

Core

RLlib

Tune

Other Libraries

Thanks

Ray 0.8.0 Release Notes

Ray 0.8.0 Release Notes

Highlights

Core

RLlib

Tune

Other Libraries

Thanks

ray-0.7.7

Ray 0.7.7 Release Notes

Highlights

Core

RLlib

Tune

Other Libraries

Thanks

ray-0.7.6

Ray 0.7.6 Release Notes

Highlights

Core

RLlib

Tune

Other Libraries

Thanks

ray-0.7.5

Ray 0.7.5 Release Notes

Ray API

Core

Tune

RLlib

Other Libraries

ray-0.7.4

Ray 0.7.4 Release Notes

Highlights

Core

RLlib

Tune

Other Libraries

Thanks

ray-0.7.3

Ray 0.7.3 Release Note

Highlights

Core

RLlib

Tune:

Autoscaler

Others Libraries:

Thanks:

ray-0.7.2