Confirm JSON config for FFHQ-1024? #103

tin-sely · 2024-04-16T02:50:27Z

I'm planning on using the config for the FFHQ-1024, just wanted to double check it's correct.

Is the "Conditioning Dropout Rate", the same as mapping_dropout_rate, or something else?
Attention Heads (Width / Head Dim) seems like it's configured automatically based on the "widths", and "depths"?
For Levels (Local + Global Attention) 3+2, I assume I add three " {"type": "shifted-window", "d_head": 64, "window_size": 7}, ", and two {"type": "global", "d_head": 64}?

{
  "model": {
    "type": "image_transformer_v2",
    "input_channels": 3, 
    "input_size": [1024, 1024],
    "patch_size": [4, 4],
    "depths": [2, 2, 2, 2, 2], 
    "widths": [128, 256, 384, 768, 1024],
    "self_attns": [
      {"type": "shifted-window", "d_head": 64, "window_size": 7}, 
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "global", "d_head": 64},
      {"type": "global", "d_head": 64}
    ],
    "loss_config": "karras",
    "loss_weighting": "soft-min-snr", 
    "dropout_rate": [0.0, 0.0, 0.0, 0.0, 0.1], 
    "mapping_dropout_rate": 0.1,
    "augment_prob": 0.12, 
    "sigma_data": 0.5, 
    "sigma_min": 1e-3,
    "sigma_max": 1e3, 
    "sigma_sample_density": {
      "type": "cosine-interpolated" 
    }
  },
  "dataset": {
    "type": "huggingface", 
    "location": "nelorth/oxford-flowers", 
    "image_key": "image" 
  },
  "optimizer": {
    "type": "adamw",
    "lr": 5e-4, 
    "betas": [0.9, 0.95], 
    "eps": 1e-8, 
    "weight_decay": 1e-2 
  },
  "lr_sched": {
    "type": "constant", 
    "warmup": 0.0 
  },
  "ema_sched": {
    "type": "inverse", 
    "power": 0.75, 
    "max_value": 0.9999 
  }
}

The text was updated successfully, but these errors were encountered:

stefan-baumann · 2024-04-16T07:51:41Z

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

Something else iirc
Yes

Luo-Yihong · 2024-11-27T13:32:55Z

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

Something else iirc

Yes

May you release the pre-trained models of HDiT on FFHQ-1024?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirm JSON config for FFHQ-1024? #103

Confirm JSON config for FFHQ-1024? #103

tin-sely commented Apr 16, 2024

stefan-baumann commented Apr 16, 2024 •

edited

Loading

Luo-Yihong commented Nov 27, 2024

Confirm JSON config for FFHQ-1024? #103

Confirm JSON config for FFHQ-1024? #103

Comments

tin-sely commented Apr 16, 2024

stefan-baumann commented Apr 16, 2024 • edited Loading

Luo-Yihong commented Nov 27, 2024

stefan-baumann commented Apr 16, 2024 •

edited

Loading