Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction accuracy using earthformer is lower than the rainformer issue #67

Open
Helomin opened this issue Jan 26, 2024 · 0 comments
Open

Comments

@Helomin
Copy link

Helomin commented Jan 26, 2024

Both use the sevir dataset to predict for two hours
Using the adamw optimizer with a learning rate of 1e-4, none of them used the task learning rate optimization strategy
Here are the earthformer model parameter settings:
base_units=128,
block_units=None,
scale_alpha=1.0,
num_heads=4,
attn_drop=0.0,
proj_drop=0.0,
ffn_drop=0.0,

inter-attn downsample/upsample

downsample=2,
downsample_type='patch_merge',
upsample_type="upsample",
upsample_kernel_size=3,

encoder

enc_depth=[2, 2],
enc_attn_patterns=None,
enc_cuboid_size=[(4, 4, 4), (4, 4, 4)],
enc_cuboid_strategy=[('l', 'l', 'l'), ('d', 'd', 'd')],
enc_shift_size=[(0, 0, 0), (0, 0, 0)],
enc_use_inter_ffn=True,

decoder

dec_depth=[2, 2],
dec_cross_start=0,
dec_self_attn_patterns=None,
dec_self_cuboid_size=[(4, 4, 4), (4, 4, 4)],
dec_self_cuboid_strategy=[('l', 'l', 'l'), ('d', 'd', 'd')],
dec_self_shift_size=[(1, 1, 1), (0, 0, 0)],
dec_cross_attn_patterns=None,
dec_cross_cuboid_hw=[(4, 4), (4, 4)],
dec_cross_cuboid_strategy=[('l', 'l', 'l'), ('d', 'l', 'l')],
dec_cross_shift_hw=[(0, 0), (0, 0)],
dec_cross_n_temporal=[1, 2],
dec_cross_last_n_frames=None,
dec_use_inter_ffn=True,
dec_hierarchical_pos_embed=False,

global vectors

num_global_vectors=8,
use_dec_self_global=False,
dec_self_update_global=True,
use_dec_cross_global=False,
use_global_vector_ffn=False,
use_global_self_attn=True,
separate_global_qkv=True,
global_dim_ratio=1,
z_init_method='zeros',

# initial downsample and final upsample

initial_downsample_type="stack_conv",
initial_downsample_activation="leaky",

initial_downsample_type=="conv"

initial_downsample_scale=1,
initial_downsample_conv_layers=2,
final_upsample_conv_layers=2,

initial_downsample_type == "stack_conv"

initial_downsample_stack_conv_num_layers=3,
initial_downsample_stack_conv_dim_list=[16, 64, 128], # [96, 384, 768]
initial_downsample_stack_conv_downscale_list=[3, 2, 2],
initial_downsample_stack_conv_num_conv_list=[2, 2, 2],

# end of initial downsample and final upsample

ffn_activation='gelu',
gated_ffn=False,
norm_layer='layer_norm',
padding_type='ignore',
pos_embed_type='t+hw',
checkpoint_level=0,
use_relative_pos=True,
self_attn_use_final_proj=True,
dec_use_first_self_attn=False,

initialization

attn_linear_init_mode="0",
ffn_linear_init_mode="0",
conv_init_mode="0",
down_up_linear_init_mode="0",
norm_init_mode="0",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant