How to run my own datasetwithin Earthformer #62

LivingRoomcode · 2023-12-08T02:40:04Z

I want to use Earthformer to train my own dataset and test it, what format should I process the data into and what py files should I prepare?

gaozhihan · 2023-12-10T00:25:46Z

Thanks for your question. You may want to refer to the simplest test case to verify if the shapes are aligned correctly. Please note that this test script is from my fork, which has not been merged into this repo.

LivingRoomcode · 2023-12-10T13:54:15Z

I ran the command in readme: python3 -m pytest.
The result is the following error, what does it mean？

gaozhihan · 2023-12-10T16:42:52Z

I'm not sure if you are using the correct script in my fork, but you don't need to run pytest. Please simply try python ROOT_DIR/tests/test_cuboid.py.

LivingRoomcode · 2023-12-14T09:27:39Z

I run the test code according to what you said, and the result shows that the model lacks parameters. How to specify these two model parameters?

gaozhihan · 2023-12-14T16:29:50Z

You should parse the args to CuboidTransformerModel like

https://github.com/gaozhihan/earth-forecasting-transformer/blob/a5c07f22ec53ba577d679e0a3be8eb7e77d3e82c/tests/test_cuboid.py#L24-L29

LivingRoomcode · 2023-12-15T04:05:11Z

Thank you very much for your patient reply！ I successfully ran this code.

LivingRoomcode · 2023-12-17T13:45:48Z

The test_cuboid.py you provided is to test the data. Do I need to write a training code according to your train_cuboid_nbody

gaozhihan · 2023-12-18T16:20:18Z

Yes, please feel free to refer to [train_cuboid_nbody.py]](https://github.com/amazon-science/earth-forecasting-transformer/blob/7732b03bdb366110563516c3502315deab4c2026/scripts/cuboid_transformer/nbody/train_cuboid_nbody.py) and train_cuboid_sevir.py for implementing your own training script. The main task is to implement your own LightningDataModule to replace the original one

earth-forecasting-transformer/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py

Lines 485 to 506 in 7732b03

    
           @staticmethod 
        
           def get_sevir_datamodule(dataset_oc, 
        
                                    micro_batch_size: int = 1, 
        
                                    num_workers: int = 8): 
        
               dm = SEVIRLightningDataModule( 
        
                   seq_len=dataset_oc["seq_len"], 
        
                   sample_mode=dataset_oc["sample_mode"], 
        
                   stride=dataset_oc["stride"], 
        
                   batch_size=micro_batch_size, 
        
                   layout=dataset_oc["layout"], 
        
                   output_type=np.float32, 
        
                   preprocess=True, 
        
                   rescale_method="01", 
        
                   verbose=False, 
        
                   # datamodule_only 
        
                   dataset_name=dataset_oc["dataset_name"], 
        
                   start_date=dataset_oc["start_date"], 
        
                   train_val_split_date=dataset_oc["train_val_split_date"], 
        
                   train_test_split_date=dataset_oc["train_test_split_date"], 
        
                   end_date=dataset_oc["end_date"], 
        
                   num_workers=num_workers,) 
        
               return dm

LivingRoomcode · 2023-12-24T12:23:08Z

My data is a csv file with M rows and N columns, where the columns of the csv file are: time, latitude, longitude, several predictive factors and target outputs affected by the predictive factors. So each row represents different predictive factors and targets at different times and different longitude locations, but my latitude and longitude are not on a regular grid of points as in the ENSO example you provided, so there is no way to handle it as an array shape like ENSO (Time, lat, lon, number of predictive factor), isn't it necessary to process the data on a regular grid with regular latitude and longitude lat x lon in order to enter it into Earthformer?

gaozhihan · 2023-12-24T17:50:34Z

Earthformer is designed to handle regularly gridded data. For your case, you may want to use masks to indicate missing values, if the data is not too sparse.

LivingRoomcode · 2023-12-25T13:06:52Z

Are there any examples for reference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run my own datasetwithin Earthformer #62

How to run my own datasetwithin Earthformer #62

LivingRoomcode commented Dec 8, 2023

gaozhihan commented Dec 10, 2023

LivingRoomcode commented Dec 10, 2023

gaozhihan commented Dec 10, 2023

LivingRoomcode commented Dec 14, 2023 •

edited

Loading

gaozhihan commented Dec 14, 2023

LivingRoomcode commented Dec 15, 2023

LivingRoomcode commented Dec 17, 2023

gaozhihan commented Dec 18, 2023

LivingRoomcode commented Dec 24, 2023 •

edited

Loading

gaozhihan commented Dec 24, 2023

LivingRoomcode commented Dec 25, 2023

How to run my own datasetwithin Earthformer #62

How to run my own datasetwithin Earthformer #62

Comments

LivingRoomcode commented Dec 8, 2023

gaozhihan commented Dec 10, 2023

LivingRoomcode commented Dec 10, 2023

gaozhihan commented Dec 10, 2023

LivingRoomcode commented Dec 14, 2023 • edited Loading

gaozhihan commented Dec 14, 2023

LivingRoomcode commented Dec 15, 2023

LivingRoomcode commented Dec 17, 2023

gaozhihan commented Dec 18, 2023

LivingRoomcode commented Dec 24, 2023 • edited Loading

gaozhihan commented Dec 24, 2023

LivingRoomcode commented Dec 25, 2023

LivingRoomcode commented Dec 14, 2023 •

edited

Loading

LivingRoomcode commented Dec 24, 2023 •

edited

Loading