Skip to content

Commit 7a11c1f

Browse files
authored
Merge branch 'main' into tb-profiler-tutorial-docs-update
2 parents 7abc7a0 + 57bad60 commit 7a11c1f

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

beginner_source/transformer_tutorial.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929

3030
######################################################################
3131
# In this tutorial, we train a ``nn.TransformerEncoder`` model on a
32-
# language modeling task. Please note that this tutorial does not cover
32+
# causal language modeling task. Please note that this tutorial does not cover
3333
# the training of `nn.TransformerDecoder <https://pytorch.org/docs/stable/generated/torch.nn.TransformerDecoder.html#torch.nn.TransformerDecoder>`__, as depicted in
3434
# the right half of the diagram above. The language modeling task is to assign a
3535
# probability for the likelihood of a given word (or a sequence of words)
@@ -41,8 +41,10 @@
4141
# Along with the input sequence, a square attention mask is required because the
4242
# self-attention layers in ``nn.TransformerDecoder`` are only allowed to attend
4343
# the earlier positions in the sequence. For the language modeling task, any
44-
# tokens on the future positions should be masked. To produce a probability
45-
# distribution over output words, the output of the ``nn.TransformerEncoder``
44+
# tokens on the future positions should be masked. This masking, combined with fact that
45+
# the output embeddings are offset with later positions ensures that the
46+
# predictions for position i can depend only on the known outputs at positions less than i.
47+
# To produce a probability distribution over output words, the output of the ``nn.TransformerEncoder``
4648
# model is passed through a linear layer to output unnormalized logits.
4749
# The log-softmax function isn't applied here due to the later use of
4850
# `CrossEntropyLoss <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html>`__,
@@ -91,6 +93,11 @@ def forward(self, src: Tensor, src_mask: Tensor = None) -> Tensor:
9193
"""
9294
src = self.embedding(src) * math.sqrt(self.d_model)
9395
src = self.pos_encoder(src)
96+
if src_mask is None:
97+
"""Generate a square causal mask for the sequence. The masked positions are filled with float('-inf').
98+
Unmasked positions are filled with float(0.0).
99+
"""
100+
src_mask = nn.Transformer.generate_square_subsequent_mask(len(src)).to(device)
94101
output = self.transformer_encoder(src, src_mask)
95102
output = self.linear(output)
96103
return output

0 commit comments

Comments
 (0)