Set diff params groups #55

floatingbigcat · 2023-09-24T07:33:34Z

Divide all the parameters into 4 params groups depending on w/wo weight_decay and being finetuend/pretrained.

Add two extra args

"finetune_keywords":list of string, parameter will be putted into fintune groups as long as its name contain one of these keywords, "image_prefix" is the only keyword for now.
"finetune_factor":float, control the learning rate of fintuned groups, whose real lr=pretrained_lr*finetune_factor

In this way, it leave enough room for further change, such as adding more modility encoder and also avoid adding too much hyperparameter to adjust the lr of finetune group

kshitijkg · 2023-09-24T07:55:03Z

Looks good overall, I was wondering if we can make it more general?

Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}
And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?

So the idea is to instead do this:

finetune_group_lr_info = {key_word, annealing_lr_params}
where annealing_lr_params : {start_lr,
warmup_iter,
total_iters,
decay_style,
last_iter,
min_lr=0.0,
}

kshitijkg · 2023-09-24T07:59:19Z

Could you also add a small dummy test case?

floatingbigcat · 2023-09-24T23:20:46Z

Looks good overall, I was wondering if we can make it more general?

Right now the finetune_factor is used for all parameters that are in the finetune_groups_key_words? Can we instead pass a list of dictionaries called finetune_group_lr_info = {key_word, finetune_factor}

And right now the only option supported is real_lr=lr*finetune_factor, can we instead just have a new class called GroupedAnnealingLR, that suports passing in different annealing lr parameters for each group?

So the idea is to instead do this:

finetune_group_lr_info = {key_word, annealing_lr_params} where annealing_lr_params : {start_lr, warmup_iter, total_iters, decay_style, last_iter, min_lr=0.0, }

As we discussed, we can make another PR for further enhancement

floatingbigcat · 2023-09-24T23:24:57Z

Could you also add a small dummy test case?

Use deep.py wrapper to run this test file.
7682ef4

floatingsnake added 4 commits September 24, 2023 16:21

get different params groups

5d35683

update learning rate

583f1f2

change megatron config generate setting

e34e97e

cvt hardcode to neox_args

e30f67b

floatingbigcat requested a review from daniel-z-kaplan September 24, 2023 07:34

kshitijkg self-requested a review September 24, 2023 07:55

add test case

7682ef4

floatingbigcat and others added 3 commits September 26, 2023 08:36

Update neox_args.py

a0657cc

add datatype of config

c93b9d4

Update utils.py

094505a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set diff params groups #55

Set diff params groups #55

floatingbigcat commented Sep 24, 2023 •

edited

Loading

kshitijkg commented Sep 24, 2023

kshitijkg commented Sep 24, 2023

floatingbigcat commented Sep 24, 2023

floatingbigcat commented Sep 24, 2023 •

edited

Loading

Set diff params groups #55

Are you sure you want to change the base?

Set diff params groups #55

Conversation

floatingbigcat commented Sep 24, 2023 • edited Loading

kshitijkg commented Sep 24, 2023

kshitijkg commented Sep 24, 2023

floatingbigcat commented Sep 24, 2023

floatingbigcat commented Sep 24, 2023 • edited Loading

floatingbigcat commented Sep 24, 2023 •

edited

Loading

floatingbigcat commented Sep 24, 2023 •

edited

Loading