pytorch · vmoens · Feb 22, 2024 · Jan 8, 2024 · Feb 6, 2024 · Feb 6, 2024
diff --git a/docs/source/reference/data.rst b/docs/source/reference/data.rst
@@ -218,24 +218,128 @@ Storing trajectories
 ~~~~~~~~~~~~~~~~~~~~
 
 It is not too difficult to store trajectories in the replay buffer.
-One element to pay attention to is that the size of the replay buffer is always
+One element to pay attention to is that the size of the replay buffer is by default
 the size of the leading dimension of the storage: in other words, creating a
 replay buffer with a storage of size 1M when storing multidimensional data
 does not mean storing 1M frames but 1M trajectories. However, if trajectories
 (or episodes/rollouts) are flattened before being stored, the capacity will still
 be 1M steps.
 
+There is a way to circumvent this by telling the storage how many dimensions
+it should take into account when saving data. This can be done through the ``ndim``
+keyword argument which is accepted by all contiguous storages such as
+:class:`~torchrl.data.replay_buffers.TensorStorage` and the likes. When a
+multidimensional storage is passed to a buffer, the buffer will automatically
+consider the last dimension as the "time" dimension, as it is conventional in
+TorchRL. This can be overridden through the ``dim_extend`` keyword argument
+in :class:`~torchrl.data.ReplayBuffer`.
+This is the recommended way to save trajectories that are obtained through
+:class:`~torchrl.envs.ParallelEnv` or its serial counterpart, as we will see
+below.
+
 When sampling trajectories, it may be desirable to sample sub-trajectories
 to diversify learning or make the sampling more efficient.
 TorchRL offers two distinctive ways of accomplishing this:
+
 - The :class:`~torchrl.data.replay_buffers.samplers.SliceSampler` allows to
   sample a given number of slices of trajectories stored one after another
   along the leading dimension of the :class:`~torchrl.data.replay_buffers.samplers.TensorStorage`.
   This is the recommended way of sampling sub-trajectories in TorchRL __especially__
   when using offline datasets (which are stored using that convention).
   This strategy requires to flatten the trajectories before extending the replay
-  buffer and reshaping them after sampling. The :class:`~torchrl.data.replay_buffers.samplers.SliceSampler`
+  buffer and reshaping them after sampling.
+  The :class:`~torchrl.data.replay_buffers.samplers.SliceSampler` class docstrings
   gives extensive details about this storage and sampling strategy.
+  Note that :class:`~torchrl.data.replay_buffers.samplers.SliceSampler`
+  is compatible with multidimensional storages. The following examples show
+  how to use this feature with and without flattening of the tensordict.
+  In the first scenario, we are collecting data from a single environment. In
+  that case, we are happy with a storage that concatenates the data coming in
+  along the first dimension, since there will be no interruption introduced
+  by the collection schedule:
+
+        >>> from torchrl.envs import TransformedEnv, StepCounter, GymEnv
+        >>> from torchrl.collectors import SyncDataCollector, RandomPolicy
+        >>> from torchrl.data import ReplayBuffer, LazyTensorStorage, SliceSampler
+        >>> env = TransformedEnv(GymEnv("CartPole-v1"), StepCounter())
+        >>> collector = SyncDataCollector(env,
+        ...     RandomPolicy(env.action_spec),
+        ...     frames_per_batch=10, total_frames=-1)
+        >>> rb = ReplayBuffer(
+        ...     storage=LazyTensorStorage(100),
+        ...     sampler=SliceSampler(num_slices=8, traj_key=("collector", "traj_ids"),
+        ...         truncated_key=None, strict_length=False),
+        ...     batch_size=64)
+        >>> for i, data in enumerate(collector):
+        ...     rb.extend(data)
+        ...     if i == 10:
+        ...         break
+        >>> assert len(rb) == 100, len(rb)
+        >>> print(rb[:]["next", "step_count"])
+        tensor([[32],
+                [33],
+                [34],
+                [35],
+                [36],
+                [37],
+                [38],
+                [39],
+                [40],
+                [41],
+                [11],
+                [12],
+                [13],
+                [14],
+                [15],
+                [16],
+                [17],
+                [...
+
+  If there are more than one environment run in a batch, we could still store
+  the data in the same buffer as before by calling ``data.reshape(-1)`` which
+  will flatten the ``[B, T]`` size into ``[B * T]`` but that means that the
+  trajectories of, say, the first environment of the batch will be interleaved
+  by trajectories of the other environments, a scenario that ``SliceSampler``
+  cannot handle. To solve this, we suggest to use the ``ndim`` argument in the
+  storage constructor:
+
+        >>> env = TransformedEnv(SerialEnv(2,
+        ...     lambda: GymEnv("CartPole-v1")), StepCounter())
+        >>> collector = SyncDataCollector(env,
+        ...     RandomPolicy(env.action_spec),
+        ...     frames_per_batch=1, total_frames=-1)
+        >>> rb = ReplayBuffer(
+        ...     storage=LazyTensorStorage(100, ndim=2),
+        ...     sampler=SliceSampler(num_slices=8, traj_key=("collector", "traj_ids"),
+        ...         truncated_key=None, strict_length=False),
+        ...     batch_size=64)
+        >>> for i, data in enumerate(collector):
+        ...     rb.extend(data)
+        ...     if i == 100:
+        ...         break
+        >>> assert len(rb) == 100, len(rb)
+        >>> print(rb[:]["next", "step_count"].squeeze())
+        tensor([[ 6,  5],
+                [ 2,  2],
+                [ 3,  3],
+                [ 4,  4],
+                [ 5,  5],
+                [ 6,  6],
+                [ 7,  7],
+                [ 8,  8],
+                [ 9,  9],
+                [10, 10],
+                [11, 11],
+                [12, 12],
+                [13, 13],
+                [14, 14],
+                [15, 15],
+                [16, 16],
+                [17, 17],
+                [18,  1],
+                [19,  2],
+                [...
+
 
 - Trajectories can also be stored independently, with the each element of the
   leading dimension pointing to a different trajectory. This requires