Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672

darkSuperman · 2024-12-17T01:33:53Z

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-4-9b example

…4-9b example

LaurentMazare · 2024-12-17T09:59:16Z

Did you try out the change? There doesn't seem to be a tokenizer.json file in the repo that you've switched to as far as I can tell.https://huggingface.co/THUDM/glm-4-9b/tree/main

darkSuperman · 2024-12-19T07:10:09Z

Sorry I misread that. Also I found this branch used natively within Transformers, and it also provides a tokenizer.json, but loading the model requires some changes Are you interested in using this branch to modify the glm4 example? https://huggingface.co/THUDM/glm-4-9b/tree/refs%2Fpr%2F15

LaurentMazare · 2024-12-19T07:15:11Z

The main reason why this model uses the tokenizer from codegeex4 is that the two tokenizers should be identical. When you look at the two sentencepiece model they have the same hash, here and here so I don't think the current version is actually fine or maybe I'm missing something here?

darkSuperman · 2024-12-20T07:03:27Z

Yes, they are the same. Also, when I was running the inference, I encountered some problems and it seemed that I could not finish the inference. I am trying it and I will give you feedback if I have more information.

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-…

e64b0a7

…4-9b example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672

darkSuperman commented Dec 17, 2024

LaurentMazare commented Dec 17, 2024

darkSuperman commented Dec 19, 2024

LaurentMazare commented Dec 19, 2024

darkSuperman commented Dec 20, 2024

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672

Are you sure you want to change the base?

Fix tokenizer mismatch bug between model and tokenizer for THUDM/glm-… #2672

Conversation

darkSuperman commented Dec 17, 2024

LaurentMazare commented Dec 17, 2024

darkSuperman commented Dec 19, 2024

LaurentMazare commented Dec 19, 2024

darkSuperman commented Dec 20, 2024