Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static_reals Parameter bug #1761

Open
Arjein opened this issue Jan 26, 2025 · 1 comment
Open

static_reals Parameter bug #1761

Arjein opened this issue Jan 26, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Arjein
Copy link

Arjein commented Jan 26, 2025

static_reals Parameter bug
The TimeSeriesDataset requires a static_reals parameter, which is described in the documentation as accepting a List of variables. However, when I pass a List to this parameter, I encounter an error. Interestingly, it works correctly when I provide a single String instead of a List.

# create dataset and dataloaders
max_encoder_length = 36
max_prediction_length = 6
training_cutoff = df["time_idx"].max()  - max_prediction_length

print(f'Training Cutoff: {training_cutoff}')

# Target Labels
time_idx = 'time_idx'
target_labels = 'close_price'

group_ids = ['asset_id']

static_reals=['asset_id'],

time_varying_known_reals = ["time_idx", "month", "day", "day_of_week", "hour", "minute"]


time_varying_unknown_reals = [
    "open_price", 
    "high_price", 
    "low_price", 
    "close_price", 
    "volume", 
    "quote_asset_volume", 
    "number_of_trades", 
    "taker_buy_base_asset_volume", 
    "taker_buy_quote_asset_volume"
]
# 
training = TimeSeriesDataSet(
    df[lambda x: x['time_idx'] <= training_cutoff],
    time_idx=time_idx,  
    
    target=target_labels, # For single Data Prediction
    
    group_ids=group_ids,
    time_varying_known_reals=time_varying_known_reals,
    static_reals=static_reals,
    time_varying_unknown_reals=time_varying_unknown_reals,
    
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,
)

TypeError

TypeError                                 Traceback (most recent call last)
Cell In[47], [line 31](vscode-notebook-cell:?execution_count=47&line=31)
     [19](vscode-notebook-cell:?execution_count=47&line=19) time_varying_unknown_reals = [
     [20](vscode-notebook-cell:?execution_count=47&line=20)     "open_price", 
     [21](vscode-notebook-cell:?execution_count=47&line=21)     "high_price", 
   (...)
     [28](vscode-notebook-cell:?execution_count=47&line=28)     "taker_buy_quote_asset_volume"
     [29](vscode-notebook-cell:?execution_count=47&line=29) ]
     [30](vscode-notebook-cell:?execution_count=47&line=30) # 
---> [31](vscode-notebook-cell:?execution_count=47&line=31) training = TimeSeriesDataSet(
     [32](vscode-notebook-cell:?execution_count=47&line=32)     df[lambda x: x['time_idx'] <= training_cutoff],
     [33](vscode-notebook-cell:?execution_count=47&line=33)     time_idx=time_idx,  
     [34](vscode-notebook-cell:?execution_count=47&line=34)     
     [35](vscode-notebook-cell:?execution_count=47&line=35)     target=target_labels, # For single Data Prediction
     [36](vscode-notebook-cell:?execution_count=47&line=36)     
     [37](vscode-notebook-cell:?execution_count=47&line=37)     group_ids=group_ids,
     [38](vscode-notebook-cell:?execution_count=47&line=38)     time_varying_known_reals=time_varying_known_reals,
     [39](vscode-notebook-cell:?execution_count=47&line=39)     static_reals=static_reals,
     [40](vscode-notebook-cell:?execution_count=47&line=40)     time_varying_unknown_reals=time_varying_unknown_reals,
     [41](vscode-notebook-cell:?execution_count=47&line=41)     
     [42](vscode-notebook-cell:?execution_count=47&line=42)     min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
     [43](vscode-notebook-cell:?execution_count=47&line=43)     max_encoder_length=max_encoder_length,
     [44](vscode-notebook-cell:?execution_count=47&line=44)     max_prediction_length=max_prediction_length,
     [45](vscode-notebook-cell:?execution_count=47&line=45) 
...
    [833](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pytorch_forecasting/data/timeseries.py:833)         # lagged variables are only transformed - not fitted
    [834](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pytorch_forecasting/data/timeseries.py:834)         continue
    [835](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pytorch_forecasting/data/timeseries.py:835)     elif name not in self._scalers:

TypeError: unhashable type: 'list'

Expected behavior
And it works when I pass a string variable.

max_encoder_length = 36
max_prediction_length = 6
training_cutoff = df["time_idx"].max()  - max_prediction_length

print(f'Training Cutoff: {training_cutoff}')

# Target Labels
time_idx = 'time_idx'
target_labels = 'close_price'

group_ids = ['asset_id']

static_reals='asset_id',

time_varying_known_reals = ["time_idx", "month", "day", "day_of_week", "hour", "minute"]


time_varying_unknown_reals = [
    "open_price", 
    "high_price", 
    "low_price", 
    "close_price", 
    "volume", 
    "quote_asset_volume", 
    "number_of_trades", 
    "taker_buy_base_asset_volume", 
    "taker_buy_quote_asset_volume"
]
# 
training = TimeSeriesDataSet(
    df[lambda x: x['time_idx'] <= training_cutoff],
    time_idx=time_idx,  
    
    target=target_labels, # For single Data Prediction
    
    group_ids=group_ids,
    time_varying_known_reals=time_varying_known_reals,
    static_reals=static_reals,
    time_varying_unknown_reals=time_varying_unknown_reals,
    
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,

)

OUTPUT

Training Cutoff: 49993
Training Dataset Length: 49989 | Validation Dataset Length: 1
Train DataLoader Length: 1562 | Validation DataLoader Length: 1
Max time_idx: 49999, Training Cutoff: 49993
Data points in training: 49994
Data points in validation: 6

Additional context
It also does not work when I pass a string directly in the intialzer.

# create dataset and dataloaders
max_encoder_length = 36
max_prediction_length = 6
training_cutoff = df["time_idx"].max()  - max_prediction_length

print(f'Training Cutoff: {training_cutoff}')

# Target Labels
time_idx = 'time_idx'
target_labels = 'close_price'

group_ids = ['asset_id']

#static_reals='asset_id',

time_varying_known_reals = ["time_idx", "month", "day", "day_of_week", "hour", "minute"]


time_varying_unknown_reals = [
    "open_price", 
    "high_price", 
    "low_price", 
    "close_price", 
    "volume", 
    "quote_asset_volume", 
    "number_of_trades", 
    "taker_buy_base_asset_volume", 
    "taker_buy_quote_asset_volume"
]
# 
training = TimeSeriesDataSet(
    df[lambda x: x['time_idx'] <= training_cutoff],
    time_idx=time_idx,  
    
    target=target_labels, # For single Data Prediction
    
    group_ids=group_ids,
    time_varying_known_reals=time_varying_known_reals,
    static_reals='asset_id',
    time_varying_unknown_reals=time_varying_unknown_reals,
    
    min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
    max_encoder_length=max_encoder_length,
    max_prediction_length=max_prediction_length,

)

KeyError

KeyError                                  Traceback (most recent call last)
Cell In[56], [line 31](vscode-notebook-cell:?execution_count=56&line=31)
     [19](vscode-notebook-cell:?execution_count=56&line=19) time_varying_unknown_reals = [
     [20](vscode-notebook-cell:?execution_count=56&line=20)     "open_price", 
     [21](vscode-notebook-cell:?execution_count=56&line=21)     "high_price", 
   (...)
     [28](vscode-notebook-cell:?execution_count=56&line=28)     "taker_buy_quote_asset_volume"
     [29](vscode-notebook-cell:?execution_count=56&line=29) ]
     [30](vscode-notebook-cell:?execution_count=56&line=30) # 
---> [31](vscode-notebook-cell:?execution_count=56&line=31) training = TimeSeriesDataSet(
     [32](vscode-notebook-cell:?execution_count=56&line=32)     df[lambda x: x['time_idx'] <= training_cutoff],
     [33](vscode-notebook-cell:?execution_count=56&line=33)     time_idx=time_idx,  
     [34](vscode-notebook-cell:?execution_count=56&line=34)     
     [35](vscode-notebook-cell:?execution_count=56&line=35)     target=target_labels, # For single Data Prediction
     [36](vscode-notebook-cell:?execution_count=56&line=36)     
     [37](vscode-notebook-cell:?execution_count=56&line=37)     group_ids=group_ids,
     [38](vscode-notebook-cell:?execution_count=56&line=38)     time_varying_known_reals=time_varying_known_reals,
     [39](vscode-notebook-cell:?execution_count=56&line=39)     static_reals='asset_id',
     [40](vscode-notebook-cell:?execution_count=56&line=40)     time_varying_unknown_reals=time_varying_unknown_reals,
     [41](vscode-notebook-cell:?execution_count=56&line=41)     
     [42](vscode-notebook-cell:?execution_count=56&line=42)     min_encoder_length=max_encoder_length // 2,  # keep encoder length long (as it is in the validation set)
     [43](vscode-notebook-cell:?execution_count=56&line=43)     max_encoder_length=max_encoder_length,
     [44](vscode-notebook-cell:?execution_count=56&line=44)     max_prediction_length=max_prediction_length,
     [45](vscode-notebook-cell:?execution_count=56&line=45) 
...
-> [6249](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py:6249)         raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   [6251](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py:6251)     not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
   [6252](https://file+.vscode-resource.vscode-cdn.net/Users/arjein/Documents/GitHub/AI/coin_pred/~/Documents/GitHub/AI/coin_pred/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py:6252)     raise KeyError(f"{not_found} not in index")

KeyError: "None of [Index(['a'], dtype='object')] are in the [columns]"

and these are the columns:

df.columns

Index(['open_price', 'high_price', 'low_price', 'close_price', 'volume',
       'quote_asset_volume', 'number_of_trades', 'taker_buy_base_asset_volume',
       'taker_buy_quote_asset_volume', 'time_idx', 'asset_id', 'year', 'month',
       'day', 'day_of_week', 'hour', 'minute'],
      dtype='object')

Versions

System:
python: 3.12.8 (main, Dec 3 2024, 18:42:41) [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/arjein/Documents/GitHub/AI/coin_pred/.venv/bin/python
machine: macOS-15.1.1-arm64-arm-64bit

Python dependencies:
pip: 24.3.1
sktime: 0.35.0
sklearn: 1.5.2
skbase: 0.12.0
numpy: 2.1.3
scipy: 1.15.1
pandas: 2.2.3
matplotlib: 3.10.0
joblib: 1.4.2
numba: None
statsmodels: 0.14.4
pmdarima: None
statsforecast: None
tsfresh: None
tslearn: None
torch: 2.5.1
tensorflow: None

@Arjein Arjein added the bug Something isn't working label Jan 26, 2025
@github-project-automation github-project-automation bot moved this to Needs triage & validation in Bugfixing - pytorch-forecasting Jan 26, 2025
@Sohaib-Ahmed21
Copy link
Contributor

@Arjein according to my understanding so far, for single static_reals , pass it as string and for multiple, pass it as a list as passed in this tutorial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Needs triage & validation
Development

No branches or pull requests

2 participants