You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM.
I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?
On one hand, the codelogits, cache = model.run_with_cache(prompt, return_cache_object=True) only cache the hidden states of the prompt, because it dosen't run the generate function.
On the other hand, the code output = model.generate(prompt, do_sample = False, max_new_tokens = 20) only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.
So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?
The text was updated successfully, but these errors were encountered:
Meehaohao
changed the title
How to get the hook while generating new tokens?
How to get the Activation cache while the LLM is generating new tokens?
Aug 7, 2024
Unfortunately, at this moment in time there is no integration of activation cache in the generate function. I don't see any reason why we can't add that as an option, but it would unfortunately be a pretty low priority given some other projects that are currently being worked on unless someone volunteers to do it.
Question
My prompt is "Which instrument does Henry Halstead mainly play? Please answer an instrument name. Answer: ", which is a question to LLM.
I want to get the cached hidden states of the LLM responsed tokens while LLM is generating the response. How to do it?
On one hand, the code
logits, cache = model.run_with_cache(prompt, return_cache_object=True)
only cache the hidden states of the prompt, because it dosen't run the generate function.On the other hand, the code
output = model.generate(prompt, do_sample = False, max_new_tokens = 20)
only get the generated tokens or sentences, I can't get the Activation_cache of the generated Answer.So how can I obtain the model response and the Acativation_cache of the LLM response tokens at the same time during one reasoning process?
The text was updated successfully, but these errors were encountered: