Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] TimeSeriesDataSet inference mode (?) #1711

Open
grudloff opened this issue Nov 9, 2024 · 2 comments
Open

[ENH] TimeSeriesDataSet inference mode (?) #1711

grudloff opened this issue Nov 9, 2024 · 2 comments
Labels
feature request New feature or request

Comments

@grudloff
Copy link

grudloff commented Nov 9, 2024

Currently, TimeSeriesDataSet has the option to set the predict_mode flag to True, this allows using the whole sequence, except the last portion used for testing purposes, which will be predicted by the model.

However, I haven't found a way to predict using the whole sequence (Think for instance a kaggle competition where you have to submit the following x month predictions with the data you have). I think that an easy workaround could be to just append dummy data at the end so that the effective sequence is the whole sequence (i.e. matching the length of the dummy data appended and the prediction length).

Is there a way to do this currently? If not, I believe that something similar to the predict_mode could be a nice way to activate this behavior.

@grudloff
Copy link
Author

grudloff commented Nov 10, 2024

Minimum example of workaround:

import pandas as pd
from pytorch_forecasting import TimeSeriesDataSet

# Define the dataset
max_encoder_length = 10
prediction_length = 3

# Create a dummy dataset
data = pd.DataFrame({
    "time_idx": list(range(max_encoder_length)),
    "target": list(range(100,100+max_encoder_length)),
    "group": ["A"] * max_encoder_length,
})

print(data)

# Append dummy data to the end
dummy_data = pd.DataFrame({
    "time_idx": list(range(max_encoder_length, max_encoder_length+prediction_length)),
    "target": [0] * prediction_length,
    "group": ["A"] * prediction_length,
})
data = pd.concat([data, dummy_data], ignore_index=True)

# Create TimeSeriesDataSet
dataset = TimeSeriesDataSet(
    data,
    time_idx="time_idx",
    target="target",
    group_ids=["group"],
    min_encoder_length=max_encoder_length // 2,
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=prediction_length,
    predict_mode=True,
    target_normalizer=None
)

# Create a dataloader
dataloader = dataset.to_dataloader(train=False, batch_size=1)

# Print the first batch
for x, y in dataloader:
    print("Encoder input")
    print(x["encoder_target"].numpy())
    print("Decoder input")
    print(x["decoder_target"].numpy())
    print("Encoder lengths")
    print(x["encoder_lengths"].numpy())
    print("Dummy target")
    print(y)

output:

>>> Data
>>>    time_idx  target group
>>>   0         0     100     A
>>>   1         1     101     A
>>>   2         2     102     A
>>>   3         3     103     A
>>>   4         4     104     A
>>>   5         5     105     A
>>>   6         6     106     A
>>>   7         7     107     A
>>>   8         8     108     A
>>>   9         9     109     A
>>> Encoder input
>>> [[100. 101. 102. 103. 104. 105. 106. 107. 108. 109.]]
>>> Decoder input
>>> [[0. 0. 0.]]
>>> Encoder lengths
>>> [10]

@fkiraly fkiraly added the bug Something isn't working label Nov 13, 2024
@github-project-automation github-project-automation bot moved this to Needs triage & validation in Bugfixing - pytorch-forecasting Nov 13, 2024
@fkiraly fkiraly added feature request New feature or request and removed bug Something isn't working labels Nov 13, 2024
@fkiraly fkiraly changed the title TimeSeriesDataSet inference mode (?) [ENH] TimeSeriesDataSet inference mode (?) Nov 13, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Nov 13, 2024

Hm, I think this is a deeper design issue. I agree that this should be possible, easily. I also think the TimeSeriesDataSet has too many arguments and is too specific.

I have opened a new issue to redesign the data handling layer, there are multiple related problems that one may want to address here: #1716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants