Finally managed to run Qwen models successfully with Outlines #514

aalyousfi · 2024-01-09T11:14:27Z

aalyousfi
Jan 9, 2024

I've used many models with Outlines without problems. However, I've been unable to use Qwen successfully. Running this simple example:

model = models.transformers("Qwen/Qwen-7B-Chat", 
                            device="auto", 
                            model_kwargs={"trust_remote_code": True}, 
                            tokenizer_kwargs={"trust_remote_code": True}
                           )

prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?

Review: This restaurant is great.
"""

answer = outlines.generate.choice(model, ["Positive", "Positive"])(prompt)

gives this error:

ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'}).

It was frustrating because I could use Qwen models with transformers library (i.e. directly, without Outlines) with no issues. Anyway, after some research, I managed to get past this error by running:

model.tokenizer.tokenizer.pad_token = '<|endoftext|>'
model.tokenizer.tokenizer.pad_token_id = 151643

But then I get another error:

RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and BFloat16 for the source.

I managed to solve this error by adding "fp32": True to model_kwargs in the code above. After that, the model runs without issues. So just wanted to share this here for anyone facing the same issue and maybe it's useful to enhance Qwen support with Outlines.

Note: This works for Transformers. For AWQ, you need to also add "bf16":False to model_kwargs which is by default True.

However, it becomes slower since we are using 32 bits instead of 16 and I get a warning saying:

Your device support faster inference by passing bf16=True in "AutoModelForCausalLM.from_pretrained"

So that made me wonder if there's a way to use models with bf16=True with Outlines? I think the performance boost is significant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finally managed to run Qwen models successfully with Outlines #514

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Finally managed to run Qwen models successfully with Outlines #514

aalyousfi Jan 9, 2024

Replies: 0 comments

aalyousfi
Jan 9, 2024