Replies: 1 comment
-
@tanz63 these configurations are more or less the defaults from Regarding fine-tuning, it's a dataset-agnostic proof of concept. We did not deliberate hard about the setting. One could potentially obtain significantly better fine-tuning performance by carefully validating the hyperparameters.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In Chronos paper, training is implemented with fixed step number_("The models were optimized for 200K steps using the AdamW optimizer with a weight decay of 0.01. The
learning rate was annealed linearly from its initial value of 0.001 to 0 over the training steps")_. what's the logic behind this configration? Since there is no downstream task fine tuning like LLM counterpart, how is it supposed to avoid overfitting? Are there some heuristics like the step number is equal to 1 or 2 epochs?
Also, the fine tuning is implemented in a dataset-agnostic fashion with an initial learning rate of 0.001, annealed
linearly to 0 over 1000 steps, what are the insights behind it?
Beta Was this translation helpful? Give feedback.
All reactions