Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are steps required to fine-tune using past_feat_dynamic_real and feat_dynamic_real? #176

Open
aravindcheruvu opened this issue Jan 22, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@aravindcheruvu
Copy link

aravindcheruvu commented Jan 22, 2025

Bug Description

I am trying to fine-tune a model using a custom dataset that includes feat_dynamic_real variates. However, I noticed that the current implementation may not directly support these additional covariates. Specifically:

  • Do we need to modify [src/uni2ts/data/builder/simple.py] to accommodate feat_dynamic_real and past_feat_dynamic_real?
  • Is the generator function written correctly to handle these covariates?
  • What additional changes are required in the Moirai architecture to fully support fine-tuning with these covariates?

Example Code to Reproduce

Here is a simplified preprocessing snippet:

import pandas as pd
import numpy as np
from datasets import Dataset, Features, Sequence, Value
from typing import Any, Generator

# Example data
time_index = pd.date_range(start="2023-01-01", periods=10, freq="D")

df = pd.DataFrame(
    {
        "user_id": ["user1"] * 10 + ["user2"] * 10,
        "timestamp": list(time_index) * 2,
        "target": np.random.rand(20),  # Target time series for all users
        "past_feat_1": np.random.rand(20),  # Example past feature
        "past_feat_2": np.random.rand(20),  # Example past feature
        "feat_1": np.random.rand(20),  # Example full-range feature
        "feat_2": np.random.rand(20),  # Example full-range feature
    }
)

time_index = pd.date_range(start="2023-01-01", periods=5, freq="D")

df2 = pd.DataFrame(
    {
        "user_id": ["user3"] * 5 + ["user4"] * 5,
        "timestamp": list(time_index) * 2,
        "target": np.random.rand(10),  # Target time series for all users
        "past_feat_1": np.random.rand(10),  # Example past feature
        "past_feat_2": np.random.rand(10),  # Example past feature
        "feat_1": np.random.rand(10),  # Example full-range feature
        "feat_2": np.random.rand(10),  # Example full-range feature
    }
)

df = pd.concat([df, df2], ignore_index=True)

# Generator function
def batch_user_gen_func() -> Generator[dict[str, Any], None, None]:
    grouped = df.groupby("user_id")
    
    for user_id, user_data in grouped:
        # Sort user data by timestamp (if not already sorted)
        user_data = user_data.sort_values("timestamp")
        
        # Build the dictionary for this user's batch
        yield {
            "user_id": user_id,  # User identifier
            "start": user_data["timestamp"].iloc[0],  # First timestamp
            "freq": pd.infer_freq(user_data["timestamp"]),  # Infer frequency
            "target": user_data["target"].to_numpy(),  # Target time series
            "past_feat_dynamic_real": user_data[["past_feat_1", "past_feat_2"]].to_numpy(),  # Past-only features
            "feat_dynamic_real": user_data[["feat_1", "feat_2"]].to_numpy(),  # Full-range features
        }

# Feature schema
features = Features(
    {
        "user_id": Value("string"),  # Unique user identifier
        "start": Value("timestamp[s]"),  # Start timestamp
        "freq": Value("string"),  # Time frequency
        "target": Sequence(Value("float32")),  # Target time series (1D array)
        "past_feat_dynamic_real": Sequence(Sequence(Value("float32"))),  # 2D past-only features
        "feat_dynamic_real": Sequence(Sequence(Value("float32"))),  # 2D full-range features
    }
)

# Create dataset
hf_dataset = Dataset.from_generator(
    batch_user_gen_func,
    features=features,
)

# Save dataset
hf_dataset.save_to_disk("user_batch_dataset")
print("Dataset created and saved to 'user_batch_dataset'.")

Questions

  1. Support for Covariates:

    • Does simple.py need modifications to handle feat_dynamic_real and past_feat_dynamic_real?
    • If yes, what changes should be made?
  2. Generator Function:

    • Is the batch_user_gen_func implementation correct for passing feat_dynamic_real and past_feat_dynamic_real to the dataset?
  3. Architecture Adjustments:

    • What changes are needed in the Moirai architecture to fully utilize feat_dynamic_real and past_feat_dynamic_real during fine-tuning?

Expected Behavior

  • The architecture should seamlessly handle feat_dynamic_real and past_feat_dynamic_real variates during training and inference.
  • The dataset should properly integrate these covariates and allow for their use in fine-tuning.
@aravindcheruvu aravindcheruvu added the bug Something isn't working label Jan 22, 2025
@chenghaoliu89
Copy link
Contributor

Hi @aravindcheruvu, thanks for your good question. I think your code should work for past_feat_dynamic_real. You can check example 2.4 from https://github.com/SalesforceAIResearch/uni2ts/blob/main/example/moirai_forecast_pandas.ipynb, feat_dynamic_real should include the data for both lookback window and forecast window. Let me know if you face any issue.

We will enhance the fine-tuning module in the next version. @zqiao11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants