You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @AkariAsai
I see the current file run_short_form.py has a parameter logprobs set tologprobs=32016 and logprobs=5000 at two separate places. I suppose for the current version the default value of "max_logprobs" in LLM (for vllm) is set to "5".
We can pass this "max_logprob" in LLM argument and make the code run, but is a value of 32k needed as logprob, bacause this makes the code way slower. Curious to know what setting did you use to get the results in the paper.
Thanks!
The text was updated successfully, but these errors were encountered:
I too was once troubled by this issue, but the solution is actually quite straightforward. Whether it's predicting [retrieval] tokens or generating responses, there's no necessity to involve the entire vocabulary. Typically, when employing greedy and nucleus sampling, we tend to sample the tokens with the highest probabilities, while the remaining tokens, with much lower probabilities, are seldom selected for output. I have rewritten the code for selfrag which is more concise, and you may find it helpful to refer to this revised version. https://github.com/fate-ubw/RAGLAB/blob/main/raglab/rag/infer_alg/self_rag_reproduction/selfrag_reproduction.py#L627
Hi @AkariAsai
I see the current file run_short_form.py has a parameter logprobs set to
logprobs=32016
andlogprobs=5000
at two separate places. I suppose for the current version the default value of "max_logprobs" in LLM (for vllm) is set to "5".We can pass this "max_logprob" in LLM argument and make the code run, but is a value of 32k needed as logprob, bacause this makes the code way slower. Curious to know what setting did you use to get the results in the paper.
Thanks!
The text was updated successfully, but these errors were encountered: