Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improving Documentation] Contributing inspectable notebook for imputation on custom dataset #25

Open
b2jia opened this issue Apr 26, 2023 · 2 comments

Comments

@b2jia
Copy link

b2jia commented Apr 26, 2023

Thank you for this amazing resource! Like others have raised in other issues, it seems:

  • documentation is either incomplete or not up-to-date
  • users express difficulty adapting own dataset to tsl

In addition to those two points, I've also noticed that:

  • the example scripts are not inspectable (ie. objects are not easily unpacked to check dimensions, properties, etc.)
  • the examples sometimes do not illustrate how to accommodate multivariate data, only multiple sensors

As a complete outsider to GNNs, I am wondering if I could get the authors' help in getting feedback on creating an example for beginners. In this way, I am hoping to contribute to the documentation, such that even a complete novice (such as I) can get started using tsl.

For instance, I have been thinking - say there is dataset of car trajectories, collected over time. How can we go from a dataframe (shown below), to training a model in tsl to predict the missing positions x, y, z?

import numpy as np
import pandas as pd

# Define number of trajectories and time points
num_traj = 5
num_timepoints = 10

# Generate random trajectories
data = pd.DataFrame(np.random.randn(num_traj*num_timepoints, 4), columns=['x', 'y', 'z', 't'])

# Assign trajectory ID for each time point
data['trajectory'] = np.repeat(np.arange(num_traj), num_timepoints)

# Set some values to NaN to represent missing positions
data.iloc[np.random.choice(data.index, size=10, replace=False), :3] = np.nan

# Set timepoints to positive integers and the same for all instances of each trajectory
for traj_id in range(num_traj):
    traj_data = data.loc[data['trajectory'] == traj_id]
    traj_data['t'] = np.arange(num_timepoints)
    data.loc[data['trajectory'] == traj_id] = traj_data
@marshka
Copy link
Member

marshka commented May 2, 2023

Hi, thank you for your passionate interest in our project! Contributions are indeed very welcome.

Mind that tsl is meant to deal with spatiotemporal data, so yes in principle data coming from sensor networks. Typically such data have 3 dimensions: time, space (i.e., sensors/nodes), and features (thus accommodating multivariate sensor observations).

In your case, are the cars synchronously moving in the same space? Or do you rather have a collection of unrelated time series, each of which is a single-car trajectory constituting a single sample in the dataset?

tsl-like datasets are designed to model the first scenario, in which the different time series are synchronous and somewhat connected. For the second case, we could think of another solution that can bypass all the burden deriving from the sliding-window functionalities in the TabularDataset. As far as I know, @LucaButera is working on something similar and can be of help.

@b2jia
Copy link
Author

b2jia commented May 2, 2023

@marshka Thanks for this response! I see. Indeed in my problem I have exactly as you put it - "a collection of unrelated time series, each of which is a single-car trajectory constituting a single sample in the dataset".

To start simply, I am thinking - is it possible to aid the imputation by passing in as edge attributes the original, complete distance matrix between positions over time? In other words, if I know how far apart the missing point should be relative to known positions, can I recover the missing position? Then, if this works, perhaps a harder problem is to solve without the aid of this distance matrix. I would love to know what @LucaButera is working on that might be able to help with this type of problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants