- trainer:
- Custom scoring now supported for selecting the best model. #1202
- data:
- stats:
InfoStats
has a new non-optional fieldbest_score
which is used for selecting the best model. #1202
- stats:
This release introduces a new package evaluation
that integrates best
practices for running experiments (seeding test and train environmets) and for
evaluating them using the rliable
library. This should be especially useful for algorithm developers for comparing
performances and creating meaningful visualizations. This functionality is
currently in alpha state and will be further improved in the next releases.
You will need to install tianshou with the extra eval
to use it.
The creation of multiple experiments with varying random seeds has been greatly
facilitated. Moreover, the ExpLauncher
interface has been introduced and
implemented with several backends to support the execution of multiple
experiments in parallel.
An example for this using the high-level interfaces can be found here, examples that use low-level interfaces will follow soon.
Apart from that, several important
extensions have been added to internal data structures, most notably to Batch
.
Batches now implement __eq__
and can be meaningfully compared. Applying
operations in a nested fashion has been significantly simplified, and checking
for NaNs and dropping them is now possible.
One more notable change is that torch Distribution
objects are now sliced when
slicing a batch. Previously, when a Batch with say 10 actions and a dist
corresponding to them was sliced to [:3]
, the dist
in the result would still
correspond to all 10 actions. Now, the dist is also "sliced" to be the
distribution of the first 3 actions.
A detailed list of changes can be found below.
evaluation
: New package for repeating the same experiment with multiple seeds and aggregating the results. #1074 #1141 #1183data
:Batch
:- Add methods
to_dict
andto_list_of_dicts
. #1063 #1098 - Add methods
to_numpy_
andto_torch_
. #1098, #1117 - Add
__eq__
(semantic equality check). #1098 keys()
deprecated in favor ofget_keys()
(needed to make iteration consistent with naming) #1105.- Major: new methods for applying functions to values, to check for NaNs and drop them, and to set values. #1181
- Slicing a batch with a torch distribution now also slices the distribution. #1181
- Add methods
data.collector
:Collector
:- Introduced
BaseCollector
as a base class for all collectors. #1123 - Add method
close
#1063 - Method
reset
is now more granular (new flags controlling behavior). #1063
- Introduced
CollectStats
: Add convenience constructorwith_autogenerated_stats
. #1063
trainer
:- Trainers can now control whether collectors should be reset prior to training. #1063
policy
:- introduced attribute
in_training_step
that is controlled by the trainer. #1123 - policy automatically set to
eval
mode when collecting and totrain
mode when updating. #1123 - Extended interface of
compute_action
to also support array-like inputs #1169
- introduced attribute
highlevel
:SamplingConfig
:- Add support for
batch_size=None
. #1077 - Add
training_seed
for explicit seeding of training and test environments, thetest_seed
is inferred fromtraining_seed
. #1074
- Add support for
experiment
:Experiment
now has aname
attribute, which can be set usingExperimentBuilder.with_name
and which determines the default run name and therefore the persistence subdirectory. It can still be overridden inExperiment.run()
, the new parameter name beingrun_name
rather thanexperiment_name
(although the latter will still be interpreted correctly). #1074 #1131- Add class
ExperimentCollection
for the convenient execution of multiple experiment runs #1131 - The
World
object, containing all low-level objects needed for experimentation, can now be extracted from anExperiment
instance. This enables customizing the experiment prior to its execution, bridging the low and high-level interfaces. #1187 ExperimentBuilder
:- Add method
build_seeded_collection
for the sound creation of multiple experiments with varying random seeds #1131 - Add method
copy
to facilitate the creation of multiple experiments from a single builder #1131
- Add method
env
:- Added new
VectorEnvType
calledSUBPROC_SHARED_MEM_AUTO
and used in for Atari and Mujoco venv creation. #1141
- Added new
utils
:logger
:- Loggers can now restore the logged data into python by using the
new
restore_logged_data
method. #1074 - Wandb logger extended #1183
- Loggers can now restore the logged data into python by using the
new
net.continuous.Critic
:- Add flag
apply_preprocess_net_to_obs_only
to allow the preprocessing network to be applied to the observations only (without the actions concatenated), which is essential for the case where we want to reuse the actor's preprocessing network #1128
- Add flag
torch_utils
(new module)- Added context managers
torch_train_mode
andpolicy_within_training_step
#1123
- Added context managers
print
DataclassPPrintMixin
now supports outputting a string, not just printing the pretty repr. #1141
highlevel
:CriticFactoryReuseActor
: Enable the Critic flagapply_preprocess_net_to_obs_only
for continuous critics, fixing the case where we want to reuse an actor's preprocessing network for the critic (affects usages of the experiment builder methodwith_critic_factory_use_actor
with continuous environments) #1128- Policy parameter
action_scaling
value"default"
was not correctly transformed to a Boolean value for algorithms SAC, DDPG, TD3 and REDQ. The value"default"
being truthy caused action scaling to be enabled even for discrete action spaces. #1191
atari_network.DQN
:- Fix constructor input validation #1128
- Fix
output_dim
not being set iffeatures_only
=True andoutput_dim_added_layer
is not None #1128
PPOPolicy
:- Fix
max_batchsize
not being used inlogp_old
computation insideprocess_fn
#1168
- Fix
- Fix
Batch.__eq__
to allow comparing Batches with scalar array values #1185
Collector
s rely less on state, the few stateful things are stored explicitly instead of through a.data
attribute. #1063- Introduced a first iteration of a naming convention for vars in
Collector
s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063
- Improved typing for
exploration_noise
and within Collector. #1063 - Better variable names related to model outputs (logits, dist input etc.). #1032
- Improved typing for actors and critics, using Tianshou classes
like
Actor
,ActorProb
, etc., instead of justnn.Module
. #1032 - Added interfaces for most
Actor
andCritic
classes to enforce the presence offorward
methods. #1032 - Simplified
PGPolicy
forward by unifying thedist_fn
interface (see associated breaking change). #1032 - Use
.mode
of distribution instead of relying on knowledge of the distribution type. #1032 - Exception no longer raised on
len
of emptyBatch
. #1084 - tests and examples are covered by
mypy
. #1077 NetBase
is more used, stricter typing by making it generic. #1077- Use explicit multiprocessing context for creating
Pipe
insubproc.py
. #1102
data
:Collector
:- Removed
.data
attribute. #1063 - Collectors no longer reset the environment on initialization.
Instead, the user might have to call
reset
expicitly or passreset_before_collect=True
. #1063 - Removed
no_grad
argument fromcollect
method (was unused in tianshou). #1123
- Removed
Batch
:- Fixed
iter(Batch(...)
which now behaves the same way asBatch(...).__iter__()
. Can be considered a bugfix. #1063 - The methods
to_numpy
andto_torch
in are not in-place anymore (useto_numpy_
orto_torch_
instead). #1098, #1117 - The method
Batch.is_empty
has been removed. Instead, the user can simply check for emptiness of Batch by usinglen
on dicts. #1144 - Stricter
cat_
, only concatenation of batches with the same structure is allowed. #1181 to_torch
andto_numpy
are no longer static methods. SoBatch.to_numpy(batch)
should be replaced bybatch.to_numpy()
. #1200
- Fixed
utils
:logger
:BaseLogger.prepare_dict_for_logging
is now abstract. #1074- Removed deprecated and unused
BasicLogger
(only affects users who subclassed it). #1074
utils.net
:Recurrent
now receives and returns aRecurrentStateBatch
instead of a dict. #1077
- Modules with code that was copied from sensAI have been replaced by
imports from new dependency sensAI-utils:
tianshou.utils.logging
is replaced withsensai.util.logging
tianshou.utils.string
is replaced withsensai.util.string
tianshou.utils.pickle
is replaced withsensai.util.pickle
env
:- All VectorEnvs now return a numpy array of info-dicts on reset instead of a list. #1063
policy
:- Changed interface of
dist_fn
inPGPolicy
and all subclasses to take a single argument in both continuous and discrete cases. #1032
- Changed interface of
AtariEnvFactory
constructor (in examples, so not really breaking) now requires explicit train and test seeds. #1074EnvFactoryRegistered
now requires an explicittest_seed
in the constructor. #1074highlevel
:params
: The parameterdist_fn
has been removed from the parameter objects (PGParams
,A2CParams
,PPOParams
,NPGParams
,TRPOParams
). The correct distribution is now determined automatically based on the actor factory being used, avoiding the possibility of misspecification. Persisted configurations/policies continue to work as expected, but code must not specify thedist_fn
parameter. #1194 #1195env
:EnvFactoryRegistered
: parameterseed
has been replaced by the pair of parameterstrain_seed
andtest_seed
Persisted instances will continue to work correctly. Subclasses such asAtariEnvFactory
are also affected requires explicit train and test seeds. #1074VectorEnvType
:SUBPROC_SHARED_MEM
has been replaced bySUBPROC_SHARED_MEM_DEFAULT
. It is recommended to useSUBPROC_SHARED_MEM_AUTO
instead. However, persisted configs will continue working. #1141
- Fixed env seeding it
test_sac_with_il.py
so that the test doesn't fail randomly. #1081 - Improved CI triggers and added telemetry (if requested by user) #1177
- Improved environment used in tests.
- Improved tests bach equality to check with scalar values #1185
- DeepDiff added to help with diffs of batches in tests. #1098
- Bumped black, idna, pillow
- New extra "eval"
- Bumped numba to >=60.0.0, permitting installation on python 3.12 # 1177
- New dependency sensai-utils
Started after v1.0.0