diff --git a/steps/25_pytorch_forecasting_dsipts/step.md b/steps/25_pytorch_forecasting_dsipts/step.md
new file mode 100644
index 0000000..b5e38d5
--- /dev/null
+++ b/steps/25_pytorch_forecasting_dsipts/step.md
@@ -0,0 +1,381 @@
+# API rework of pytorch-forecasting and dsipts merge (pytorch-forecasting version 2)
+
+Contributors: @fkiraly, @agobbifbk
+
+## High-level summary 
+
+### The Aim
+
+To create a unified interface for `torch` forecasters, in `pytorch-forecasting` version 2,
+suitable for the entire ecosystem of `torch` based forecasters, including `dsip-ts`,
+various foundation models, and inspired by `dsip-ts` and `time-series-library`.
+
+### Context
+
+Over 2024, in `sktime`, interfaces to a variety of deep learning and foundation models have been added,
+amongst these, `pytorch-forecasting`, and various foundation models.
+
+This exercise showed that the `pytorch-forecasting` design does not generalize to foundation models,
+and made some limitations of the package apparent, such as strong coupling to `pandas` and `scikit-learn`
+which prevents large-scale use.
+
+At the same time, the `dsip-ts` package (by agobbifbk) emerged, contributing interesting ideas
+of API uniformity, and a simple API.
+
+It was decided that both packages - `pytorch-forecasting` and `dsip-ts` - would merge
+with the aim to create `pytorch-forecasting v2` - as the "sktime" of `torch` forecasting models.
+
+References:
+
+* `pytorch-forecasting` handover plan (to `sktime`) https://github.com/sktime/pytorch-forecasting/issues/1592
+* re-design thread for `pytorch-forecasting` 2.0 with `dsip-ts` https://github.com/sktime/pytorch-forecasting/issues/1736
+    * this thread also contains a summary of technical planning meetings and sync design discussions
+* umbrella issue `sktime` on foundation models https://github.com/sktime/sktime/issues/6177
+* early coordination discussion deep learning, foundation models, `pytorch-forecasting` https://github.com/sktime/sktime/issues/6381
+* 2025 `sktime` roadmap with a focus on deep learning and `torch` https://github.com/sktime/sktime/issues/7707
+
+### requirements
+
+* M: unified model API which is easily extensible and composable, similar to `sktime` and DSIPTS, but as closely to the `pytorch` level as possible. The API need not cover forecasters in general, only `torch` based forecasters.
+    * M: unified monitoring and logging API, also see https://github.com/sktime/pytorch-forecasting/issues/1700
+    * M: extension templates need to be created
+    * S: `skbase` can be used to curate the forecasters as records, with tags, etc
+    * S: model persistence
+    * C: third party extension patterns, so new models can "live" in other repositories or packages, for instance `thuml`
+* M: reworked and unified data input API
+    * M: support static variables and categoricals
+    * S: support for multiple data input locations and formats - pandas, polars, hard drive, distributed, etc
+* M: MLops and benchmarking features as in DSIPTS
+* S: support for pre-training, model hubs, foundation models, but this could be post-2.0
+
+### The proposed solution
+
+Our proposed solution consists of the following components:
+
+* a two-layered `DataSet` input layer: first layer generic input, second layer. The first layer (layer D1) is unified and model-independent, the second layer (layer D2) is model or model class specific.
+* a two-layered model layer: a composite layer. The inner layer is pure `torch` (layer T), the outer layer (layer M) provides a unified interface, is a composite of metadata, references to layer D2, and to layer T.
+* downwards compatible migration and refactoring strategies for `pytorch-forecasting`, `dsipts`, and `thuml`, towards a unified whole that also leaves current structures intact.
+
+## Design: `pytorch-forecasting` 2.0
+
+### Conceptual model and layers
+
+Following discussions collected in the linked issue [1736](https://github.com/sktime/pytorch-forecasting/issues/1736),
+
+the design consists of four layers as mentioned above:
+
+* layer D1: unified `DataSet` interface
+* layer D2: model specific `DataSet` and `DataLoader` interface
+* layer T: raw `torch` models
+* layer M: unified model layer: models with metadata and reference to d2 layer
+
+Reasoning for the layers:
+
+* both `DataSet` and `torch` model input/output tend to be specific to implementation.
+From examples seen, it is unlikely that they can be sensibly unified.
+* therefore, additional unification layers are needed, one for data and one for models.
+* discussions in the issue (also motivated by notes of janbeitner in the original code)
+converged on two `DataSet` based layers, using the standard `DataLoader`.
+* this implies a second unification layer on the other side, for models - given that
+unifying model interfaces is the primary goal.
+
+Conceptually, layers align with concepts as follows:
+
+* layer D1: abstract data type of "collection of time series", a `Panel` in `sktime`
+parlance. Implementation
+can be arbitrary, e.g., `pandas` or `polars`, or hard-drive files.
+* layer M: abstract model taking in the `Panel` data for training or inference.
+This *includes* data pre-processing, re-sampling, batching.
+* layer T: concrete neural network with free parameters, *excluding* data pre-processing, re-sampling, batching.
+This starts at data that is already pre-processed, re-sampled, batched.
+* layer D2: M minus T.
+
+### Alignment of current packages with layers
+
+`pytorch-forecasting`:
+
+* currently has two layers, a data layer and a model layer
+* data layer = D1 plus D2 plus M (lasagna) = `TimeSeriesDataSet`
+* model layer = T
+* `BaseModel` is similar to M, but assumes data layer
+* in particular, there is no uniformization layer for data or models that would cover, e.g., foundation models
+* this also makes the design of very limited extensibility beyond certain decoder/encoder models
+
+`dsip-ts`:
+
+* currently has three layers
+* pre-processing functions, prior to use of `DataSet` - D1
+* data set and data loader: D2 plus M
+* model layer = T
+* improvement compared to `pytorch-forecasting`, because there is a data uniformization layer
+    * but unfortunately D1 is not in the form of `DataSet` which would allow scaling
+    * model uniformization layer from layer D2 onwards, but not D1
+
+
+### mid-level interface designs
+
+#### layer D1
+
+Aim: model `Panel` data as closely as possible, while satisfying data requirements
+
+Data requirements:
+
+* agnostic towards data location - `pandas`, `polars`, hard drive
+* capturing metadata: numeric/categorical, past/future known, dynamic/static
+
+Design:
+
+* `DataSet` extension API with unified `__getitem__` output, defined by `BaseTSDataSet`
+* `__init__` captures input that can vary
+    * for downwards capability, current inputs in `pytorch-forecasting` and `dsipts` supported
+* inheritance pattern and strategy pattern
+* simplest-as-possible `__getitem__` return
+
+##### interface: proposed `__getitem__` return of `BaseTSDataSet`
+
+As implemented in draft [PR 1757](https://github.com/sktime/pytorch-forecasting/pull/1757)
+
+Precise specs to be discussed.
+
+```
+    Sampling via ``__getitem__`` returns a dictionary,
+    which always has following str-keyed entries:
+    * t: tensor of shape (n_timepoints)
+      Time index for each time point in the past or present. Aligned with ``y``,
+      and ``x`` not ending in ``f``.
+    * y: tensor of shape (n_timepoints, n_targets)
+      Target values for each time point. Rows are time points, aligned with ``t``.
+      Columns are targets, aligned with ``col_t``.
+    * x: tensor of shape (n_timepoints, n_features)
+      Features for each time point. Rows are time points, aligned with ``t``.
+    * group: tensor of shape (n_groups)
+      Group ids for time series instance.
+    * st: tensor of shape (n_static_features)
+      Static features.
+    * y_cols: list of str of length (n_targets)
+      Names of columns of ``y``, in same order as columns in ``y``.
+    * x_cols: list of str of length (n_features)
+      Names of columns of ``x``, in same order as columns in ``x``.
+    * st_cols: list of str of length (n_static_features)
+      Names of entries of ``st``, in same order as entries in ``st``.
+    * y_types: list of str of length (n_targets)
+      Types of columns of ``y``, in same order as columns in ``y``.
+      Types can be "c" for categorical, "n" for numerical.
+    * x_types: list of str of length (n_features)
+      Types of columns of ``x``, in same order as columns in ``x``.
+      Types can be "c" for categorical, "n" for numerical.
+    * st_types: list of str of length (n_static_features)
+      Types of entries of ``st``, in same order as entries in ``st``.
+    * x_k: list of int of length (n_features)
+      Whether the feature is known in the future, encoded by 0 or 1,
+      in same order as columns in ``x``.
+      0 means the feature is not known in the future, 1 means it is known.
+    Optionally, the following str-keyed entries can be included:
+    * t_f: tensor of shape (n_timepoints_future)
+      Time index for each time point in the future.
+      Aligned with ``x_f``.
+    * x_f: tensor of shape (n_timepoints_future, n_features)
+      Known features for each time point in the future.
+      Rows are time points, aligned with ``t_f``.
+    * weight: tensor of shape (n_timepoints), only if weight is not None
+    * weight_f: tensor of shape (n_timepoints_future), only if weight is not None
+```
+
+##### Extension pattern
+
+* inherit from `BaseTSDataSet`
+* custom `__init__` input, can be anything, including file locations
+* dataclass-like
+* logic only needs to comply with `__getitem__` expectation
+
+
+#### layer D2
+
+Aim: prepare unified data input from layer D1 for `torch` model
+
+Design:
+
+* `DataSet` extension API with unified `__init__` input, expecting `BaseTSDataSet`
+* further `__init__` fields may be arbitrarily present, dataclass-like
+* `__getitem__` return is specific to a limited range of `torch` models
+* default assumption is standard `DataLoader`
+* optionally, custom `DataLoader` may be supplied
+
+##### Example, based on current `pytorch-forecasting` models
+
+Current `TimeSeriesDataSet(data, **params)` to be replaced with
+
+```python
+tsd = PandasTSDataSet(df, **metadata)  # layer D1
+DecoderEncoderData(tsd, **params_without_metadata)  # layer D2
+```
+
+* where `metadata` is as above in layer D1
+* and `params_without_metadata` contains decoder/encoder specific variables
+    * `max_encoder_length`
+    * `min_encoder_length`
+    * `max_decoder_length`
+    * `min_decoder_length`
+    * `constant_fill_strategy`
+    * `allow_missing_timesteps`
+    * `lags`
+    * and so on
+
+The return of the `DecoderEncoderData` instance` should be exactly the same
+as of current `TimeSeriesDataSet`, when invoked with equivalent parameters and data.
+
+##### Example, based on current `dsip-ts` models
+
+For custom data, this should work
+
+```python
+tsd = PandasTSDataSet(df, **metadata)  # layer D1
+DsiptsPipeline(tsd, **params_without_metadata)  # layer D2
+```
+
+For pre-defined datasets, this should work
+
+```python
+tsd = BenchmarkDataSet(name:str, config)  # layer D1
+DsiptsPipeline(tsd, **params_without_metadata)  # layer D2
+```
+
+#### layer T
+
+The model layer contains layers and full models using `pytorch-lightning` interfaces.
+
+These are simple loose classes as currently present in all packages, i.e., `nn.Module` subclasses.
+
+
+#### layer M
+
+Some unknowns here and work in progress.
+
+Suggested design:
+
+* class design: metadata class with pointer to layer T and D2, plus metadata
+* `scikit-base` compatible collection of parameters
+    * neural network parameters (`torch.nn`)
+    * training and inference parameters
+* switch between training and inference mode
+* directly interfaces with layer D1 on the outside
+    * possibility to construct `from_dataset` or similar, like in ptf
+
+
+```python
+class MyNetwork(BasePtfNetwork):
+
+    _tags = {
+        "capability:categorical": True,
+        "capability:futureknown": False,
+        "capability:static": False,
+        "etc"
+    }
+
+    def __init__(
+        self,
+        **network_params,
+        **network_configs,
+        **loader_params,
+    )
+
+    def ref_network(self):  # pointer to network, could be more complicated
+        from somewhere import MyTorchNetwork
+
+        return MyTorchNetwork
+
+    def ref_dataloader(self):  # pointer to dataloader
+        from somewhere import D2LoaderForMyTorchnetwork
+
+        return D2LoaderForMyTorchnetwork
+
+    @classmethod
+    def from_dataset(cls, dataset):  # sets parameters from dataset
+        return cls(**get_params_from(dataset))
+
+    def should_we_forward_lignthing_methods(self, **kwargs):  # ?
+
+    def train(self, dataset):
+        # logic related to training
+
+    def predict(self, dataset)
+        # logic related to inference
+```
+
+
+#### usage vignette
+
+should maybe center more around the data loader
+
+```python
+
+data_loader = my_class(configs).get_dataloader(more_configs)
+
+# need training and validation data loader separately
+data_loader_validation = my_class(configs).get_dataloader(more_configs)
+
+```
+
+action AG - can you write a speculative usage vignette?
+
+Let us use `lightning` as much as possible?
+
+Change the class as necessary
+
+
+
+## Change and deprecation
+
+### `pytorch-forecasting`
+
+* Networks can be left as-is mostly, for downwards compatibility
+
+* `TimeSeriesDataSet` should alias this, see above
+
+```python
+tsd = PandasTSDataSet(df, **metadata)  # layer D1
+DecoderEncoderData(tsd, **params_without_metadata)  # layer D2
+```
+
+It should be possible to keep interfaces as-is with this aliasing.
+
+### `dsip-ts`
+
+* need to introduce a D1-to-D2 `DataSet`
+* current pipeline can still be used
+
+## Implementation phases
+
+### Phase 0 - design
+
+Agreement on this design document and target state
+
+### Phase 1 - `DataSet` layer
+
+Suggested to use `pytorch-forecasting` and introduce D1/D2 separation as an API
+preserving refactor of `TimeSeriesDataSet`, as follows:
+
+1. add D1 `BaseTSDataSet` and `PandasTSDataSet` child class, and tests
+2. add `DecoderEncoderData` to obtain interface on par with `TimeSeriesDataSet`
+3. change `TimeSeriesDataSet` to alias the D1/D2 composite
+4. add one or two further `BaseTSDataSet` as proof-of-concept: `polars` or hard drive files
+    * use this to improve `DecoderEncoderData` to avoid too high in-memory usage
+
+
+### Phase 2a - `dsipts` `DataSet` layer integration
+
+Can start in middle of phase 1, at a stage where `BaseTsDataSet` is consolidated.
+
+1. refactor current data pipeline to be a single `DataSet` class.
+2. rebase pipeline on `BaseTSDataSet` interface, ensure refactor and API consistency
+
+### Phase 2b - Model layer
+
+1. `BasePtfNetwork` experimental design and full API tests, using phase 1 objects
+2. refactor at least two `pytorch-forecasting` models to this design, design iteration
+
+### Phase 3 - ecosystem
+
+* `dsip-ts` models
+* `pytorch-forecasting` models
+* `thuml` models