Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced gemma prediction with new flawless logit #51

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

carlofisicaro
Copy link

Integration of a flawless_logit

  • Derived by first computing logits for each token in isolation, normalizing these logits, and then subtracting the normalized sum from the baseline_logits.
  • Encoding each token individually.
  • Applying a normalization step (final_norm).
  • Decoding the normalized tokens to get logits.
  • Normalizing these logits using a softmax function.
  • Summing the normalized logits.
  • Subtracting this sum from the gemma2 original logit

By normalizing the logits for each token, the model ensures that the predictions are more balanced and less likely to be dominated by any single token.
Subtracting the normalized sum can help reduce biases and make the logits more representative of the actual distribution of the data.

Based on initial tests conducted on Gemma2 7B, it appears that the performance at inference time has been improved.

carlofisicaro and others added 18 commits September 18, 2024 14:18
PiperOrigin-RevId: 663277444
Change-Id: I8d7030ce586577a433c48f32df7efa7c141b171a
…ormer_lib.make_causal_attn_mask(input_mask)`

PiperOrigin-RevId: 663692225
Change-Id: Ie2cb6229302087ea1ce5b5c7f442a088207ead07
PiperOrigin-RevId: 665414923
Change-Id: I42bc41074518e3065f85c7f1a3014fdd09cffe4c
Currently all weights in FeedForward layers are initialized to zero. This doesn't cause any issues when loading the module with pretrained weights, but if training from scratch it will result in all gradients being zero throughout training so no learning can occur. Changing w_gating be be initialized from a normal distribution fixes this.

PiperOrigin-RevId: 674306730
Change-Id: I90800dbe605cdf88f341d103f102357ff278a393
PiperOrigin-RevId: 674394389
Change-Id: I25ba5ad4769c3101c2bf572e33723d4a241e3895
…se errors for implicit rank promotion.

PiperOrigin-RevId: 675179053
Change-Id: I55459c1aa99c7d33ae3f03712eaed01ccc5fc9f2
Copy link

google-cla bot commented Sep 21, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@carlofisicaro
Copy link
Author

The GitHub CLA check doesn't recognize the noreply user @a-googler <no****ly​@google.com>.

How shall I proceed? Should I use an interactive rebase to edit the author of the related commits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants