Give the possibility to obtain the full response when calling the vLLM generate function #1199

alonsosilvaallende · 2024-10-08T10:37:57Z

I'm using InspectAI to evaluate language models. In particular, I'm evaluating the benefits of structured text generation using Outlines with language models. I would like to obtain the full response when calling the vLLM generate function since InspectAI expects to get the full response. Would it be possible to give the possibility to the user to get the full response. The default should still be the same as now which is a filtered response.

rlouf · 2024-10-08T10:59:55Z

What do you mean by full response?

alonsosilvaallende · 2024-10-08T13:09:35Z

What do you mean by full response?

InspectAI needs the full "results" variable returned by the model.generate call of the vLLM API, see line 131 here

outlines/outlines/models/vllm.py

Line 131 in a2fd35c

results = self.model.generate(

Currently only the texts are returned in a list starting at line 137:

outlines/outlines/models/vllm.py

Lines 137 to 149 in a2fd35c

    
           results = [[sample.text for sample in batch.outputs] for batch in results] 
        
           batch_size = len(results) 
        
           sample_size = len(results[0]) 
        
           if batch_size == 1 and sample_size == 1: 
        
               return results[0][0] 
        
           elif batch_size == 1: 
        
               return results[0] 
        
           elif sample_size == 1: 
        
               return [batch[0] for batch in results] 
        
           return results

rlouf · 2024-10-08T20:58:34Z

Couldn't you just implement a [custom solver](https://inspect.ai-safety-institute.org.uk/solvers.html) for InspectAI?

LouSalaun · 2024-10-09T12:16:36Z

Hi @rlouf, I'm also interested in this. Custom solvers and custom models are indeed the way to go. However, there is still the issue that we loose information when using Outlines' generate function. For example, with Outlines' wrapper of vLLM, we don't have the stop_reason, logprobs and output_tokens fields of the LLM's output.

Would adding an optional argument to output directly the results of vLLM make sense to you? The default behavior would remain the same. I can send a pull request.

LouSalaun linked a pull request Oct 16, 2024 that will close this issue

Add optional argument original_output to vLLM's generate function #1212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give the possibility to obtain the full response when calling the vLLM generate function #1199

Give the possibility to obtain the full response when calling the vLLM generate function #1199

alonsosilvaallende commented Oct 8, 2024 •

edited

Loading

rlouf commented Oct 8, 2024

alonsosilvaallende commented Oct 8, 2024 •

edited

Loading

rlouf commented Oct 8, 2024

LouSalaun commented Oct 9, 2024 •

edited

Loading

Give the possibility to obtain the full response when calling the vLLM generate function #1199

Give the possibility to obtain the full response when calling the vLLM generate function #1199

Comments

alonsosilvaallende commented Oct 8, 2024 • edited Loading

rlouf commented Oct 8, 2024

alonsosilvaallende commented Oct 8, 2024 • edited Loading

rlouf commented Oct 8, 2024

LouSalaun commented Oct 9, 2024 • edited Loading

alonsosilvaallende commented Oct 8, 2024 •

edited

Loading

alonsosilvaallende commented Oct 8, 2024 •

edited

Loading

LouSalaun commented Oct 9, 2024 •

edited

Loading