You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added new policy distribution ReparameterizedBetaPolicyDistribution. It's a reparameterized version of the Beta distribution (who could have guessed) that predicts the mean and spread of the PDF instead of alpha and beta. This allows the spread to be controlled without input dependency, leading to more stability.
Improved customizability of the Gatherer class. Gatherer now has a .postprocess() method called during data collection to postprocess data collected in the data buffer. The default buffer only normalizes advantages via this method, but custom gatherers can apply more postprocessing and even add or filter data to/from the buffer.
Monitoring
New group view in the web monitor. Experiments now can be assigned an optional group name. The new view can investigate the mean reward progression of grouped experiments. More functionality on this will follow in future updates.
Better filtering of experiments in the web monitor.
The hyperparameter view now also shows Gatherer information
Improved robustness against corrupted JSON files.
Other Changes
Upgraded from TensorFlow 2.4.2 to 2.9.1. Should at this point still be backward-compatible though.
Throughout optimization, several assertions have been added to simplify debugging when facing NaN/Inf values.