You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m working on a time series forecasting problem where I use a function forecast_with_covariates that accepts various covariates and input data. I’ve encountered a puzzling issue: the predictions differ significantly when I feed the data in batches versus when I process each batch individually in a loop.
Here’s what I’ve tried:
1. Batched Prediction: I passed the data in batches to the model, and here’s the output shape and first few values:
• Test forecast shape: (2, 20)
• First batch test forecast first 5 values: [0.7577893, 1.023695, 0.91585237, 0.89688027, 0.65517604]
• Second batch test forecast first 5 values: [0.20441657, 0.2393372, -0.22837491, -0.38262987, -0.5851944]
2. Looped Prediction: I passed the data one batch at a time in a loop. The results are significantly different:
• Looped forecast results shape: (2, 20)
• First batch looped forecast first 5 values: [0.6246872, 0.92845935, 0.92428553, 1.041631, 0.36359525]
• Second batch looped forecast first 5 values: [0.14961839, 0.43704915, -0.04414537, -0.20000435, -0.66028154]
Could anyone help explain why there is such a noticeable difference between these two approaches? Is there a known issue with batching, or could it be related to the covariates handling in the model?
# Simple test case
import numpy as np
# Create test data
forecast_input = [
np.sin(np.linspace(0, 20, 100)),
np.cos(np.linspace(0, 20, 100))
]
dynamic_numerical_covariates = {
"temperature": [np.random.rand(120), np.random.rand(120)]
}
dynamic_categorical_covariates = {
"weekday": [np.random.randint(0, 7, 120), np.random.randint(0, 7, 120)]
}
static_numerical_covariates = {
"base_price": [10.5, 15.0]
}
static_categorical_covariates = {
"category": ["food", "beverage"]
}
# Call the forecast_with_covariates function
test_forecast, _ = model.forecast_with_covariates(
forecast_input,
dynamic_numerical_covariates=dynamic_numerical_covariates,
dynamic_categorical_covariates=dynamic_categorical_covariates,
static_numerical_covariates=static_numerical_covariates,
static_categorical_covariates=static_categorical_covariates,
freq=[0, 0],
xreg_mode="xreg + timesfm",
ridge=0.0,
force_on_cpu=False,
normalize_xreg_target_per_input=True
)
test_forecast = np.array(test_forecast) # Convert list to NumPy array
print("Test forecast shape:", test_forecast.shape)
print("First batch test forecast first 5 values:", test_forecast[0][:5])
print("Second batch test forecast first 5 values:", test_forecast[1][:5])
# Use a loop to call forecast_with_covariates for each batch individually
num_batches = len(forecast_input)
single_batch_forecast_list = []
for i in range(num_batches):
single_batch_forecast, _ = model.forecast_with_covariates(
[forecast_input[i]], # Input one batch at a time
dynamic_numerical_covariates={k: [v[i]] for k, v in dynamic_numerical_covariates.items()},
dynamic_categorical_covariates={k: [v[i]] for k, v in dynamic_categorical_covariates.items()},
static_numerical_covariates={k: [v[i]] for k, v in static_numerical_covariates.items()},
static_categorical_covariates={k: [v[i]] for k, v in static_categorical_covariates.items()},
freq=[0], # Input one frequency at a time
xreg_mode="xreg + timesfm",
ridge=0.0,
force_on_cpu=False,
normalize_xreg_target_per_input=True
)
single_batch_forecast_list.append(single_batch_forecast[0]) # Add forecast result to the list
# Combine all batch forecast results
looped_forecast_results = np.array(single_batch_forecast_list)
print("Looped forecast results shape:", looped_forecast_results.shape)
print("First batch looped forecast first 5 values:", looped_forecast_results[0][:5])
print("Second batch looped forecast first 5 values:", looped_forecast_results[1][:5])
# Compare looped forecast results with the original forecast results
print("\nDifferences between looped forecast results and original forecast results:")
print("Maximum absolute error:", np.max(np.abs(looped_forecast_results - test_forecast)))
print("Mean absolute error:", np.mean(np.abs(looped_forecast_results - test_forecast)))
Output:
Test forecast shape: (2, 20)
First batch test forecast first 5 values: [0.7577893 1.023695 0.91585237 0.89688027 0.65517604]
Second batch test forecast first 5 values: [ 0.20441657 0.2393372 -0.22837491 -0.38262987 -0.5851944 ]
Looped forecast results shape: (2, 20)
First batch looped forecast first 5 values: [0.6246872 0.92845935 0.92428553 1.041631 0.36359525]
Second batch looped forecast first 5 values: [ 0.14961839 0.43704915 -0.04414537 -0.20000435 -0.66028154]
Differences between looped forecast results and original forecast results:
Maximum absolute error: 0.36772624
Mean absolute error: 0.14028119
The text was updated successfully, but these errors were encountered:
It's caused by the covariate handling. What the function does under the hood is that it will fit a linear model on the time points of the whole batch. It is WAI that you'll see different results if the batching is different.
Hi everyone,
I’m working on a time series forecasting problem where I use a function forecast_with_covariates that accepts various covariates and input data. I’ve encountered a puzzling issue: the predictions differ significantly when I feed the data in batches versus when I process each batch individually in a loop.
Here’s what I’ve tried:
Could anyone help explain why there is such a noticeable difference between these two approaches? Is there a known issue with batching, or could it be related to the covariates handling in the model?
Output:
The text was updated successfully, but these errors were encountered: