How to train and achieve good results？ #33

drakirk8 · 2024-06-27T06:17:48Z

Hello，

First of all, I‘m greatful for the training code you provided! Recently ,I tried to replicate the ootd-dc or ootd-hd on the data searched from shopping website. But whenever the dc or hd model , the loss curve fluctuates repeatedly around 0.02 and the results were getting worse.

after 4 epochs:

We trained the model form the ootd-dc checkpoint with
1.mixed_precision float32.
2.Resolution 1024*768
3.batch_size 8 on multi-gpu
4. epochs=4 for expirement
5. Additional 15000 paired data

We tried to use fp16 for faster and bigger batch_size, but the debugger report "Attempting to unscale FP16 gradient" at "optimizer.step()"

Besides , I wonder what the normal loss curve looks like. And How many epochs should we set for replication If the fluctuation around 0.02 is normal.

zhangzhiyuan303 · 2024-07-25T09:06:43Z

mark to see detail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train and achieve good results？ #33

How to train and achieve good results？ #33

drakirk8 commented Jun 27, 2024 •

edited

Loading

zhangzhiyuan303 commented Jul 25, 2024

How to train and achieve good results？ #33

How to train and achieve good results？ #33

Comments

drakirk8 commented Jun 27, 2024 • edited Loading

zhangzhiyuan303 commented Jul 25, 2024

drakirk8 commented Jun 27, 2024 •

edited

Loading