Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm JSON config for FFHQ-1024? #103

Open
tin-sely opened this issue Apr 16, 2024 · 2 comments
Open

Confirm JSON config for FFHQ-1024? #103

tin-sely opened this issue Apr 16, 2024 · 2 comments

Comments

@tin-sely
Copy link

I'm planning on using the config for the FFHQ-1024, just wanted to double check it's correct.

  • Is the "Conditioning Dropout Rate", the same as mapping_dropout_rate, or something else?
  • Attention Heads (Width / Head Dim) seems like it's configured automatically based on the "widths", and "depths"?
  • For Levels (Local + Global Attention) 3+2, I assume I add three " {"type": "shifted-window", "d_head": 64, "window_size": 7}, ", and two {"type": "global", "d_head": 64}?
Screenshot 2024-04-16 at 10 47 48
{
  "model": {
    "type": "image_transformer_v2",
    "input_channels": 3, 
    "input_size": [1024, 1024],
    "patch_size": [4, 4],
    "depths": [2, 2, 2, 2, 2], 
    "widths": [128, 256, 384, 768, 1024],
    "self_attns": [
      {"type": "shifted-window", "d_head": 64, "window_size": 7}, 
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "shifted-window", "d_head": 64, "window_size": 7},
      {"type": "global", "d_head": 64},
      {"type": "global", "d_head": 64}
    ],
    "loss_config": "karras",
    "loss_weighting": "soft-min-snr", 
    "dropout_rate": [0.0, 0.0, 0.0, 0.0, 0.1], 
    "mapping_dropout_rate": 0.1,
    "augment_prob": 0.12, 
    "sigma_data": 0.5, 
    "sigma_min": 1e-3,
    "sigma_max": 1e3, 
    "sigma_sample_density": {
      "type": "cosine-interpolated" 
    }
  },
  "dataset": {
    "type": "huggingface", 
    "location": "nelorth/oxford-flowers", 
    "image_key": "image" 
  },
  "optimizer": {
    "type": "adamw",
    "lr": 5e-4, 
    "betas": [0.9, 0.95], 
    "eps": 1e-8, 
    "weight_decay": 1e-2 
  },
  "lr_sched": {
    "type": "constant", 
    "warmup": 0.0 
  },
  "ema_sched": {
    "type": "inverse", 
    "power": 0.75, 
    "max_value": 0.9999 
  }
}
@stefan-baumann
Copy link

stefan-baumann commented Apr 16, 2024

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

  1. Something else iirc
  2. Yes

@Luo-Yihong
Copy link

The type for those self-attention blocks should be neighborhood unless you do want to use Swin, and we used a mapping dropout rate of 0. Apart from that, the config matches what we used.

And to answer your other two questions:

  1. Something else iirc
  2. Yes

May you release the pre-trained models of HDiT on FFHQ-1024?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants