More details on input shapes? #112

iamchrisearle · 2024-06-10T20:22:05Z

iamchrisearle
Jun 10, 2024

From the docs:

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
# forecast shape: [num_series, num_samples, prediction_length]

Where could additional details on the interpretation and limitations of the inputs be found? Tangentially, replying to @abdulfatir in #13

If you have specific multivariate use cases/datasets to share with us, please do. It will helpful for us to understand the types of practical multivariate problems.

Suppose I want to model raster time-series for rainfall with this toy example:

import matplotlib.pyplot as plt
import numpy as np

# Create three 3x3 rasters to mimic rainfall over time
raster1 = np.array([[0, 2, 5], [7, 8, 1], [3, 6, 4]])
raster2 = np.array([[1, 4, 7], [5, 2, 3], [8, 6, 0]])
raster3 = np.array([[2, 5, 8], [6, 3, 7], [1, 4, 9]])

# Set up the subplots with shared axes
fig, axs = plt.subplots(1, 3, figsize=(12, 4), sharex=True, sharey=True)

# Plot the rasters
cax1 = axs[0].matshow(raster1, cmap="Blues")
cax2 = axs[1].matshow(raster2, cmap="Blues")
cax3 = axs[2].matshow(raster3, cmap="Blues")

# Add colorbars
fig.colorbar(cax1, ax=axs[0])
fig.colorbar(cax2, ax=axs[1])
fig.colorbar(cax3, ax=axs[2])

# Set titles
axs[0].set_title("Rainfall Day 1")
axs[1].set_title("Rainfall Day 2")
axs[2].set_title("Rainfall Day 3")

# Display the plot
plt.tight_layout()
plt.show()

Could these be modeled with the ... list of 1D tensors from the docs, by flattening? If so, how can each 1D tensor in the list be interpreted as? Or is this not a valid use case? I have so far tried:

Flattening the list of tensors into one 1D tensor

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
my_data = np.array([raster1, raster2, raster3]).flatten()
context = torch.tensor(my_data)
prediction_length = 9
forecast = pipeline.predict(
    context, prediction_length
)  # shape [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(my_data), len(my_data) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(my_data, color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(
    forecast_index,
    low,
    high,
    color="tomato",
    alpha=0.3,
    label="80% prediction interval",
)
plt.legend()
plt.grid()
plt.show()

Or if each raster is flattened into a list of 1D tensors is this a more appropriate representation to model? Visually this looks incorrect to me.

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
my_data = np.array([raster1.flatten(), raster2.flatten(), raster3.flatten()]) # <-- ONLY DIF IS HERE
context = torch.tensor(my_data)
prediction_length = 9
forecast = pipeline.predict(
    context, prediction_length
)  # shape [num_series, num_samples, prediction_length]
# visualize the forecast
forecast_index = range(len(my_data), len(my_data) + prediction_length)
low, median, high = np.quantile(forecast[0].numpy(), [0.1, 0.5, 0.9], axis=0)
plt.figure(figsize=(8, 4))
plt.plot(my_data, color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(
    forecast_index,
    low,
    high,
    color="tomato",
    alpha=0.3,
    label="80% prediction interval",
)
plt.legend()
plt.grid()
plt.show()

Answered by lostella

Jun 11, 2024

@iamchrisearle here's a detail on the three options:

a 1D tensor: this is only really useful if you have a single time series that you want to predict. The only tensor axis is time.
a list of 1D tensors: similar to the case 1, tensors in the list are individual univariate time series, their only axis being time. The difference from the previous case is that all time series will be predicted in parallel. So if you have several time series you need to predict, providing it as a list is usually better (faster). Note that doing so will increase the memory consumption, so there's a limit (which is machine dependant) to how big such a "batch" can be.
a left-padded 2D tensor with batch as the f…

View full answer

lostella · 2024-06-11T07:46:32Z

lostella
Jun 11, 2024
Maintainer

@iamchrisearle here's a detail on the three options:

a 1D tensor: this is only really useful if you have a single time series that you want to predict. The only tensor axis is time.
a list of 1D tensors: similar to the case 1, tensors in the list are individual univariate time series, their only axis being time. The difference from the previous case is that all time series will be predicted in parallel. So if you have several time series you need to predict, providing it as a list is usually better (faster). Note that doing so will increase the memory consumption, so there's a limit (which is machine dependant) to how big such a "batch" can be.
a left-padded 2D tensor with batch as the first dimension: this is similar case 2, but assumes you took care already of stacking the series into a single 2D tensor. In this case the first axis (dimension 0) is the "batch dimension", i.e. it indexes different series, while the second axis (dimension 1) is time. Stacking series of different lengths will require doing some padding on the time axis (dimension 1): padding should be applied to the left, so that the original data is align to the right (all series end at the forecast instant).

Note: whatever layout you use for your input, series will be predicted independently of each other (the model is univariate; the batch dimension only makes processing happen in parallel).

For your example: my understanding is that each raster is a different point in time (a different day), so you have 9 time series of length 3 in your toy example. In this case you should do something like

>>> np.stack([raster1, raster2, raster3], axis=-1).reshape((-1, 3))

array([[0, 1, 2],
       [2, 4, 5],
       [5, 7, 8],
       [7, 5, 6],
       [8, 2, 3],
       [1, 3, 7],
       [3, 8, 1],
       [6, 6, 4],
       [4, 0, 9]])

which will have the required 2D layout (first dimension is batch, second dimension is time).

1 reply

iamchrisearle Jun 20, 2024
Author

Thanks for the response @lostella
Yes, each raster is a different point in time, so I think it's 3 time series of length 9. I was mistaken to attempt to flatten, and found that this worked best with option 2 where each 1D tensor in the list would correspond to a pixel in the raster to treat each pixel as a univariate time series. I created another toy example with an arbitrary number of "pixels" that all behave the same which seems to capture a basic linear trend as expected:

# Create a single 1D numpy array of length 10 that increments from 1 to 10
array = np.arange(1, 10)

# Create a list of 9 identical arrays (pixels)
list_of_arrays = [array.copy() for _ in range(9)]

# after passing this data through the same plot as above, only changing
my_data = np.array(list_of_arrays)

# Expected to have 9 prediction of values around 10
for i in range(forecast.shape[0]):
    print(forecast[i].median())

>>> 
tensor(9.0543)
tensor(9.3109)
tensor(9.2009)
tensor(8.9809)
tensor(9.4208)
tensor(9.0543)
tensor(9.6041)
tensor(9.3109)
tensor(9.0176)

Longer time frames looked to capture the linear trend more clearly, as expected. I can work through this with something like rasterized rainfall data and can add some findings here in case this use case is helpful for others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details on input shapes? #112

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

More details on input shapes? #112

iamchrisearle Jun 10, 2024

Replies: 1 comment · 1 reply

lostella Jun 11, 2024 Maintainer

iamchrisearle Jun 20, 2024 Author

iamchrisearle
Jun 10, 2024

Replies: 1 comment 1 reply

lostella
Jun 11, 2024
Maintainer

iamchrisearle Jun 20, 2024
Author