You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This intersects the following paper on normalization token frequencies (which suggest another set of high probability tokens, or perhaps some combined metric of token_frequency * val loss per token histogramk, average, and per subset): https://arxiv.org/abs/2411.00680
The text was updated successfully, but these errors were encountered:
After reading this paper from NeurIPS 2024, they have an interesting suggestion to focus validation loss on strategic subsets of the full token set.
The paper recommended separating loss between non-padding tokens and padding tokens.
While not stated in the paper, some interesting subsets could include:
References:
https://arxiv.org/pdf/2407.18158
This intersects the following paper on normalization token frequencies (which suggest another set of high probability tokens, or perhaps some combined metric of
token_frequency * val loss per token
histogramk, average, and per subset):https://arxiv.org/abs/2411.00680
The text was updated successfully, but these errors were encountered: