Tie OPT weights #23

philippguevorguian · 2024-09-14T00:34:25Z

This PR ties the weights of input and output embeddings for the OPT model via a minimal implementation

tigranfah · 2024-09-14T19:02:22Z

torchtitan/models/opt/model.py

@@ -364,8 +362,7 @@ def forward(self, tokens: torch.Tensor):
            h = layer(h)

        h = self.norm(h) if self.norm else h
-        output = self.output(h).float() if self.output else h
-        return output
+        return self.output(h)


why not cast the outputs to float?

tie weights of input and output embeddings of OPT for consistency

2bc80b7

philippguevorguian self-assigned this Sep 14, 2024

philippguevorguian requested review from MenuaB and tigranfah September 16, 2024 10:00

cast output to float

4e9a303

tigranfah approved these changes Sep 18, 2024

View reviewed changes

philippguevorguian merged commit 379be76 into main Sep 18, 2024
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tie OPT weights #23

Tie OPT weights #23

philippguevorguian commented Sep 14, 2024

tigranfah Sep 14, 2024

Tie OPT weights #23

Tie OPT weights #23

Conversation

philippguevorguian commented Sep 14, 2024

tigranfah Sep 14, 2024

Choose a reason for hiding this comment