[Distributed] Fix new token's shape #1254

kwen2501 · 2024-10-02T00:27:48Z

Issue

TP-only case is broken due to the following error:

[rank1]:   File "/home/kw2501/local/torchchat/torchchat/model.py", line 815, in forward
[rank1]:     bsz, seqlen, _ = x.shape
[rank1]: ValueError: not enough values to unpack (expected 3, got 2)

[rank3]:   File "/home/kw2501/local/torchchat/dist_run.py", line 477, in main
[rank3]:     output = decorder.step(new_token, **kwargs)
[rank3]:   File "/home/kw2501/local/pytorch/torch/distributed/pipelining/schedules.py", line 610, in step
[rank3]:     self._step_microbatches(args_split, kwargs_split, targets_split, losses)
[rank3]:   File "/home/kw2501/local/pytorch/torch/distributed/pipelining/schedules.py", line 710, in _step_microbatches
[rank3]:     output = self._stage.forward_one_chunk(i, arg_mbs[i], kwarg_mbs[i])  # type: ignore[index]
[rank3]:   File "/home/kw2501/local/pytorch/torch/distributed/pipelining/stage.py", line 595, in forward_one_chunk
[rank3]:     raise RuntimeError(exc_msg) from e
[rank3]: RuntimeError: 
[rank3]:             [Stage 0] failed to run forward:
[rank3]:             args: ('Tensor(torch.Size([4]), grad=False, dtype=torch.int64)',)
[rank3]:             kwargs: {'input_pos': 'Tensor(torch.Size([1]), grad=False, dtype=torch.int64)', 'cache_lane': '0'}

It suggests that in the decoding phase, our input_ids (i.e. new_tokens) is flattened rather than being 2D (batch_size, 1).

The flattening happens here:

    # Argmax (deterministic) TODO: add temperature
    next_token = torch.argmax(next_token_logits, dim=-1)

Fix

The fix is simple, we just add a keepdim=True flag to torch.argmax.
With that, the unsqueeze op in decode_in_flight can be also saved.

pytorch-bot · 2024-10-02T00:27:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1254

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1f8ff93 with merge base 8fcb3ba ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lessw2020 · 2024-10-02T00:57:05Z

dist_run.py

-        # Make a 2D tensor with ids on row dimension
-        unsqueezed = torch.unsqueeze(token, 1)
-        token_str = tokenizer.decode(unsqueezed.tolist())
+        token_str = tokenizer.decode(token.tolist())


I find that this does not work as-is.
However, adding in a one liner before the tokenizer.decode line (421):

token = token.squeeze(1)

And now it works on both llama2 and llama3.

Ah, makes sense. Tokenizer difference :)

It seems I cannot unconditionally squeeze the tensor.
If I do, some tokenizers will output:
responses ====>>>> is Christmasiving
istead of
responses ====>>>> ['is', 'Christmas', 'iving']

I am:
using tokenizer = sentencepiece.SentencePieceProcessor

Okay, I am adding an if there:

# `token` is a tensor of shape (batch_size, 1). # For TiktokenTokenizer, we need to squeeze it to 1D. # For SentencePieceProcessor, we don't. if isinstance(tokenizer, TiktokenTokenizer): token = torch.squeeze(token, dim=1) token_str = tokenizer.decode(token.tolist())

lessw2020

thanks for adding!
I find that there is a missing line for this PR to work but with that line, verified working on llama2 and llama3.
Stamping to land, please add in the squeeze line.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 2, 2024

lessw2020 reviewed Oct 2, 2024

View reviewed changes

lessw2020 approved these changes Oct 2, 2024

View reviewed changes

kwen2501 force-pushed the new_token_shape branch from 08b3f09 to 4cdf355 Compare October 2, 2024 20:23

kwen2501 changed the base branch from main to meta_init October 2, 2024 20:24

kwen2501 changed the base branch from meta_init to main October 2, 2024 20:25

[Distributed] Fix new token's shape

1f8ff93

kwen2501 force-pushed the new_token_shape branch from 4cdf355 to 1f8ff93 Compare October 2, 2024 20:26

kwen2501 merged commit 5952bd1 into main Oct 2, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Distributed] Fix new token's shape #1254

[Distributed] Fix new token's shape #1254

Uh oh!

kwen2501 commented Oct 2, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 2, 2024 •

edited

Loading

Uh oh!

lessw2020 Oct 2, 2024 •

edited

Loading

Uh oh!

kwen2501 Oct 2, 2024

Uh oh!

kwen2501 Oct 2, 2024

Uh oh!

kwen2501 Oct 2, 2024

Uh oh!

kwen2501 Oct 2, 2024

Uh oh!

lessw2020 left a comment

Uh oh!

Uh oh!

Uh oh!

[Distributed] Fix new token's shape #1254

[Distributed] Fix new token's shape #1254

Uh oh!

Conversation

kwen2501 commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Fix

Uh oh!

pytorch-bot bot commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1254

✅ No Failures

Uh oh!

lessw2020 Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kwen2501 commented Oct 2, 2024 •

edited

Loading

pytorch-bot bot commented Oct 2, 2024 •

edited

Loading

lessw2020 Oct 2, 2024 •

edited

Loading