You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the updates to CFG I wanted to test out the integration with VLLM.
I pulled the latest git repo of outlines as of today (9/5/2024) and did a pip install . on a docker image with vllm==0.6.0.
I'm not sure on the current status of the vllm integration but long story short, it crashes the whole server.
Steps/code to reproduce the bug:
fromopenaiimportOpenAIimportlarkmodel='http://10.72.5.190:15001/v1'client=OpenAI(
api_key="EMPTY",
base_url=model,
)
model_name=client.models.list().data[0].id# Grab the model name from the APIgrammar_string=r""" start: sentence %import common.WS sentence: noun WS verb WS noun -> simple noun: /[A-Za-z]+/ # match one or more letters (a general noun) verb: /[A-Za-z]+/ # match one or more letters (a general verb) # %ignore WS"""parser=lark.Lark(grammar_string)
test_sentences= [
"The dog ran quickly accross the field.",
"The duck goes to the park.",
"The chicken crosses the road.",
"The cat eats the food.",
"The dog runs around the corner.",
"The baby laughs at the clown.",
"The teacher writes on the board.",
"The student reads the book.",
"The car drives down the street.",
"The flowers bloom in the garden.",
"The musician plays the guitar.",
"The athlete wins the game.",
"The tourist visits the museum.",
"The chef cooks the meal.",
"The doctor examines the patient.",
"The engineer builds the bridge.",
]
forsentenceintest_sentences:
prompt=f"""Convert the following sentence into a sentence that follows the following grammar:"noun verb noun"Sentence: "{sentence}"Only return the transformed sentence with no explanations."""messages= [{"role": "user", "content": prompt}]
output=client.chat.completions.create(
model=model_name, # Model name to usemessages=messages, # Chat historymax_tokens=50,
extra_body={
'guided_grammar':grammar_string
})
print('With Guided Decoding')
print(output.choices[0].message.content)
Expected result:
Dog ran field.
Duck goes park.
Chicken crosses road.
....
Error message:
INFO 09-05 18:03:53 async_llm_engine.py:206] Added request chat-e6a37c28c21f43f6904a809b60aadcf5.
ERROR 09-05 18:03:53 async_llm_engine.py:63] Engine background task failed
ERROR 09-05 18:03:53 async_llm_engine.py:63] Traceback (most recent call last):
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 53, in _log_task_completion
ERROR 09-05 18:03:53 async_llm_engine.py:63] return_value = task.result()
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 939, in run_engine_loop
ERROR 09-05 18:03:53 async_llm_engine.py:63] result = task.result()
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 868, in engine_step
ERROR 09-05 18:03:53 async_llm_engine.py:63] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 345, in step_async
ERROR 09-05 18:03:53 async_llm_engine.py:63] output = await self.model_executor.execute_model_async(
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/distributed_gpu_executor.py", line 177, in execute_model_async
ERROR 09-05 18:03:53 async_llm_engine.py:63] return await self._driver_execute_model_async(execute_model_req)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 231, in _driver_execute_model_async
ERROR 09-05 18:03:53 async_llm_engine.py:63] return await self.driver_exec_model(execute_model_req)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 09-05 18:03:53 async_llm_engine.py:63] result = self.fn(*self.args, **self.kwargs)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 327, in execute_model
ERROR 09-05 18:03:53 async_llm_engine.py:63] output = self.model_runner.execute_model(
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 09-05 18:03:53 async_llm_engine.py:63] return func(*args, **kwargs)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1483, in execute_model
ERROR 09-05 18:03:53 async_llm_engine.py:63] logits = self.model.compute_logits(hidden_or_intermediate_states,
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/llama.py", line 438, in compute_logits
ERROR 09-05 18:03:53 async_llm_engine.py:63] logits = self.logits_processor(self.lm_head, hidden_states,
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 09-05 18:03:53 async_llm_engine.py:63] return self._call_impl(*args, **kwargs)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 09-05 18:03:53 async_llm_engine.py:63] return forward_call(*args, **kwargs)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 72, in forward
ERROR 09-05 18:03:53 async_llm_engine.py:63] logits = _apply_logits_processors(logits, sampling_metadata)
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/logits_processor.py", line 142, in _apply_logits_processors
ERROR 09-05 18:03:53 async_llm_engine.py:63] logits_row = logits_processor(past_tokens_ids,
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 67, in __call__
ERROR 09-05 18:03:53 async_llm_engine.py:63] instruction = self._guide.get_next_instruction(
ERROR 09-05 18:03:53 async_llm_engine.py:63] File "/usr/local/lib/python3.10/dist-packages/outlines/fsm/guide.py", line 362, in get_next_instruction
ERROR 09-05 18:03:53 async_llm_engine.py:63] if state.parser_state is None:
ERROR 09-05 18:03:53 async_llm_engine.py:63] AttributeError: 'int' object has no attribute 'parser_state'
We're trying to get language models to output sentences in a particular format while still using our main production API with vLLM. It's cost prohibitive to host multiple APIs using different LLMs so it's better if we can do everything through the same API.
The text was updated successfully, but these errors were encountered:
Describe the issue as clearly as possible:
With the updates to CFG I wanted to test out the integration with VLLM.
I pulled the latest git repo of outlines as of today (9/5/2024) and did a
pip install .
on a docker image with vllm==0.6.0.I'm not sure on the current status of the vllm integration but long story short, it crashes the whole server.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Version information
Context for the issue:
We're trying to get language models to output sentences in a particular format while still using our main production API with vLLM. It's cost prohibitive to host multiple APIs using different LLMs so it's better if we can do everything through the same API.
The text was updated successfully, but these errors were encountered: