[ENH] TimeSeriesDataSet inference mode (?) #1711

grudloff · 2024-11-09T22:49:14Z

Currently, TimeSeriesDataSet has the option to set the predict_mode flag to True, this allows using the whole sequence, except the last portion used for testing purposes, which will be predicted by the model.

However, I haven't found a way to predict using the whole sequence (Think for instance a kaggle competition where you have to submit the following x month predictions with the data you have). I think that an easy workaround could be to just append dummy data at the end so that the effective sequence is the whole sequence (i.e. matching the length of the dummy data appended and the prediction length).

Is there a way to do this currently? If not, I believe that something similar to the predict_mode could be a nice way to activate this behavior.

The text was updated successfully, but these errors were encountered:

grudloff · 2024-11-10T11:21:32Z

Minimum example of workaround:

import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet

# Define the dataset
max_encoder_length = 10
prediction_length = 3

# Create a dummy dataset
data = pd.DataFrame({
    "time_idx": list(range(max_encoder_length)),
    "target": list(range(100,100+max_encoder_length)),
    "group": ["A"] * max_encoder_length,
})

print(data)

# Append dummy data to the end
dummy_data = pd.DataFrame({
    "time_idx": list(range(max_encoder_length, max_encoder_length+prediction_length)),
    "target": [0] * prediction_length,
    "group": ["A"] * prediction_length,
})
data = pd.concat([data, dummy_data], ignore_index=True)

# Create TimeSeriesDataSet
dataset = TimeSeriesDataSet(
    data,
    time_idx="time_idx",
    target="target",
    group_ids=["group"],
    min_encoder_length=max_encoder_length // 2,
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=prediction_length,
    predict_mode=True,
    target_normalizer=None
)

# Create a dataloader
dataloader = dataset.to_dataloader(train=False, batch_size=1)

# Print the first batch
for x, y in dataloader:
    print("Encoder input")
    print(x["encoder_target"].numpy())
    print("Decoder input")
    print(x["decoder_target"].numpy())
    print("Encoder lengths")
    print(x["encoder_lengths"].numpy())
    print("Dummy target")
    print(y)

output:

>>> Data
>>>    time_idx  target group
>>>   0         0     100     A
>>>   1         1     101     A
>>>   2         2     102     A
>>>   3         3     103     A
>>>   4         4     104     A
>>>   5         5     105     A
>>>   6         6     106     A
>>>   7         7     107     A
>>>   8         8     108     A
>>>   9         9     109     A
>>> Encoder input
>>> [[100. 101. 102. 103. 104. 105. 106. 107. 108. 109.]]
>>> Decoder input
>>> [[0. 0. 0.]]
>>> Encoder lengths
>>> [10]

fkiraly · 2024-11-13T17:54:32Z

Hm, I think this is a deeper design issue. I agree that this should be possible, easily. I also think the TimeSeriesDataSet has too many arguments and is too specific.

I have opened a new issue to redesign the data handling layer, there are multiple related problems that one may want to address here: #1716

fkiraly added the bug Something isn't working label Nov 13, 2024

github-project-automation bot added this to Bugfixing - pytorch-forecasting Nov 13, 2024

github-project-automation bot moved this to Needs triage & validation in Bugfixing - pytorch-forecasting Nov 13, 2024

fkiraly mentioned this issue Nov 13, 2024

[API] (re-)design of data loader mechanism #1716

Open

fkiraly added feature request New feature or request and removed bug Something isn't working labels Nov 13, 2024

fkiraly changed the title ~~TimeSeriesDataSet inference mode (?)~~ [ENH] TimeSeriesDataSet inference mode (?) Nov 13, 2024

fkiraly removed this from Bugfixing - pytorch-forecasting Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] TimeSeriesDataSet inference mode (?) #1711

[ENH] TimeSeriesDataSet inference mode (?) #1711

grudloff commented Nov 9, 2024

grudloff commented Nov 10, 2024 •

edited

Loading

fkiraly commented Nov 13, 2024

[ENH] TimeSeriesDataSet inference mode (?) #1711

[ENH] TimeSeriesDataSet inference mode (?) #1711

Comments

grudloff commented Nov 9, 2024

grudloff commented Nov 10, 2024 • edited Loading

fkiraly commented Nov 13, 2024

grudloff commented Nov 10, 2024 •

edited

Loading