Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run my own datasetwithin Earthformer #62

Open
LivingRoomcode opened this issue Dec 8, 2023 · 11 comments
Open

How to run my own datasetwithin Earthformer #62

LivingRoomcode opened this issue Dec 8, 2023 · 11 comments

Comments

@LivingRoomcode
Copy link

I want to use Earthformer to train my own dataset and test it, what format should I process the data into and what py files should I prepare?

@gaozhihan
Copy link
Contributor

Thanks for your question. You may want to refer to the simplest test case to verify if the shapes are aligned correctly. Please note that this test script is from my fork, which has not been merged into this repo.

@LivingRoomcode
Copy link
Author

I ran the command in readme: python3 -m pytest.
The result is the following error, what does it mean?
5a32edff5a9fb4f0ec51f9a6b88ca1c

@gaozhihan
Copy link
Contributor

I'm not sure if you are using the correct script in my fork, but you don't need to run pytest. Please simply try python ROOT_DIR/tests/test_cuboid.py.

@LivingRoomcode
Copy link
Author

LivingRoomcode commented Dec 14, 2023

I run the test code according to what you said, and the result shows that the model lacks parameters. How to specify these two model parameters?
a478cc3f888e896be4de7279e781c81

@gaozhihan
Copy link
Contributor

@LivingRoomcode
Copy link
Author

Thank you very much for your patient reply! I successfully ran this code.

@LivingRoomcode
Copy link
Author

The test_cuboid.py you provided is to test the data. Do I need to write a training code according to your train_cuboid_nbody

@gaozhihan
Copy link
Contributor

Yes, please feel free to refer to [train_cuboid_nbody.py]](https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/nbody/train_cuboid_nbody.py) and train_cuboid_sevir.py for implementing your own training script. The main task is to implement your own LightningDataModule to replace the original one

@staticmethod
def get_sevir_datamodule(dataset_oc,
micro_batch_size: int = 1,
num_workers: int = 8):
dm = SEVIRLightningDataModule(
seq_len=dataset_oc["seq_len"],
sample_mode=dataset_oc["sample_mode"],
stride=dataset_oc["stride"],
batch_size=micro_batch_size,
layout=dataset_oc["layout"],
output_type=np.float32,
preprocess=True,
rescale_method="01",
verbose=False,
# datamodule_only
dataset_name=dataset_oc["dataset_name"],
start_date=dataset_oc["start_date"],
train_val_split_date=dataset_oc["train_val_split_date"],
train_test_split_date=dataset_oc["train_test_split_date"],
end_date=dataset_oc["end_date"],
num_workers=num_workers,)
return dm

@LivingRoomcode
Copy link
Author

LivingRoomcode commented Dec 24, 2023

My data is a csv file with M rows and N columns, where the columns of the csv file are: time, latitude, longitude, several predictive factors and target outputs affected by the predictive factors. So each row represents different predictive factors and targets at different times and different longitude locations, but my latitude and longitude are not on a regular grid of points as in the ENSO example you provided, so there is no way to handle it as an array shape like ENSO (Time, lat, lon, number of predictive factor), isn't it necessary to process the data on a regular grid with regular latitude and longitude lat x lon in order to enter it into Earthformer?

@gaozhihan
Copy link
Contributor

Earthformer is designed to handle regularly gridded data. For your case, you may want to use masks to indicate missing values, if the data is not too sparse.

@LivingRoomcode
Copy link
Author

Are there any examples for reference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants