Significant Difference Between Batched Prediction and Looped Prediction in timesfm #124

volcanomao · 2024-08-16T03:56:24Z

Hi everyone,

I’m working on a time series forecasting problem where I use a function forecast_with_covariates that accepts various covariates and input data. I’ve encountered a puzzling issue: the predictions differ significantly when I feed the data in batches versus when I process each batch individually in a loop.

Here’s what I’ve tried:

1.	Batched Prediction: I passed the data in batches to the model, and here’s the output shape and first few values:
•	Test forecast shape: (2, 20)
•	First batch test forecast first 5 values: [0.7577893, 1.023695, 0.91585237, 0.89688027, 0.65517604]
•	Second batch test forecast first 5 values: [0.20441657, 0.2393372, -0.22837491, -0.38262987, -0.5851944]
2.	Looped Prediction: I passed the data one batch at a time in a loop. The results are significantly different:
•	Looped forecast results shape: (2, 20)
•	First batch looped forecast first 5 values: [0.6246872, 0.92845935, 0.92428553, 1.041631, 0.36359525]
•	Second batch looped forecast first 5 values: [0.14961839, 0.43704915, -0.04414537, -0.20000435, -0.66028154]

Could anyone help explain why there is such a noticeable difference between these two approaches? Is there a known issue with batching, or could it be related to the covariates handling in the model?


# Simple test case
import numpy as np

# Create test data
forecast_input = [
    np.sin(np.linspace(0, 20, 100)),
    np.cos(np.linspace(0, 20, 100))
]
dynamic_numerical_covariates = {
    "temperature": [np.random.rand(120), np.random.rand(120)]
}
dynamic_categorical_covariates = {
    "weekday": [np.random.randint(0, 7, 120), np.random.randint(0, 7, 120)]
}
static_numerical_covariates = {
    "base_price": [10.5, 15.0]
}
static_categorical_covariates = {
    "category": ["food", "beverage"]
}

# Call the forecast_with_covariates function
test_forecast, _ = model.forecast_with_covariates(
    forecast_input,
    dynamic_numerical_covariates=dynamic_numerical_covariates,
    dynamic_categorical_covariates=dynamic_categorical_covariates,
    static_numerical_covariates=static_numerical_covariates,
    static_categorical_covariates=static_categorical_covariates,
    freq=[0, 0],
    xreg_mode="xreg + timesfm",
    ridge=0.0,
    force_on_cpu=False,
    normalize_xreg_target_per_input=True
)

test_forecast = np.array(test_forecast)  # Convert list to NumPy array

print("Test forecast shape:", test_forecast.shape)
print("First batch test forecast first 5 values:", test_forecast[0][:5])
print("Second batch test forecast first 5 values:", test_forecast[1][:5])


# Use a loop to call forecast_with_covariates for each batch individually
num_batches = len(forecast_input)
single_batch_forecast_list = []

for i in range(num_batches):
    single_batch_forecast, _ = model.forecast_with_covariates(
        [forecast_input[i]],  # Input one batch at a time
        dynamic_numerical_covariates={k: [v[i]] for k, v in dynamic_numerical_covariates.items()},
        dynamic_categorical_covariates={k: [v[i]] for k, v in dynamic_categorical_covariates.items()},
        static_numerical_covariates={k: [v[i]] for k, v in static_numerical_covariates.items()},
        static_categorical_covariates={k: [v[i]] for k, v in static_categorical_covariates.items()},
        freq=[0],  # Input one frequency at a time
        xreg_mode="xreg + timesfm",
        ridge=0.0,
        force_on_cpu=False,
        normalize_xreg_target_per_input=True
    )
    single_batch_forecast_list.append(single_batch_forecast[0])  # Add forecast result to the list

# Combine all batch forecast results
looped_forecast_results = np.array(single_batch_forecast_list)

print("Looped forecast results shape:", looped_forecast_results.shape)
print("First batch looped forecast first 5 values:", looped_forecast_results[0][:5])
print("Second batch looped forecast first 5 values:", looped_forecast_results[1][:5])

# Compare looped forecast results with the original forecast results
print("\nDifferences between looped forecast results and original forecast results:")
print("Maximum absolute error:", np.max(np.abs(looped_forecast_results - test_forecast)))
print("Mean absolute error:", np.mean(np.abs(looped_forecast_results - test_forecast)))

Output:

Test forecast shape: (2, 20)
First batch test forecast first 5 values: [0.7577893  1.023695   0.91585237 0.89688027 0.65517604]
Second batch test forecast first 5 values: [ 0.20441657  0.2393372  -0.22837491 -0.38262987 -0.5851944 ]
Looped forecast results shape: (2, 20)
First batch looped forecast first 5 values: [0.6246872  0.92845935 0.92428553 1.041631   0.36359525]
Second batch looped forecast first 5 values: [ 0.14961839  0.43704915 -0.04414537 -0.20000435 -0.66028154]

Differences between looped forecast results and original forecast results:
Maximum absolute error: 0.36772624
Mean absolute error: 0.14028119

The text was updated successfully, but these errors were encountered:

siriuz42 · 2024-08-27T18:23:47Z

It's caused by the covariate handling. What the function does under the hood is that it will fit a linear model on the time points of the whole batch. It is WAI that you'll see different results if the batching is different.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Difference Between Batched Prediction and Looped Prediction in timesfm #124

Significant Difference Between Batched Prediction and Looped Prediction in timesfm #124

volcanomao commented Aug 16, 2024 •

edited

Loading

siriuz42 commented Aug 27, 2024

Significant Difference Between Batched Prediction and Looped Prediction in timesfm #124

Significant Difference Between Batched Prediction and Looped Prediction in timesfm #124

Comments

volcanomao commented Aug 16, 2024 • edited Loading

Output:

siriuz42 commented Aug 27, 2024

volcanomao commented Aug 16, 2024 •

edited

Loading