You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 12, 2023. It is now read-only.
Hey there, I'm using this for an API endpoint and have come up against some issues I can't solve
With Vicuna and Vicuna 1.1 the stop token changed from ### to </s> but there appears to be no way to tell pyllamacpp what the stop token is. With the v0 model, it continues generating non stop, outputting prompts for the human. (probably a separate issue: With 1.1 it appears broken altogether and throws tensor errors outputting gibberish to console)
I thought of implementing a stop token detection inside the new_text_callback fn and just calling exit(), which actually does what I want, but before I got that far I was trying to get the model to stop parroting the prompt back as part of the output, which oddly it seems to take computational power to do token by token. Any idea why this is happening? Can I get it to stop parroting the prompt?
That leads to my third question - should I supply conversation history / system prompts as part of prompt or is there a different way to pass that to the generate() call?
I realise I could partially solve / hack around these limitations by not attempting to stream back to the front end (and stripping the prompt from the output with some string manipulation) but that is kind of a bummer, I'm hoping the functionality of this could be improved to allow for that?
I tried interactive mode but that appears to be a blocking process which talks directly to the console, meaning it won't be usable for a web endpoint implementation
If any of this is user error please let me know, I've just done the best I could with the doco available
The text was updated successfully, but these errors were encountered:
The old generate function was happening in the c++ side, that's why it was blocking the thread. You could've just use a separate thread for the model generation to solve the issue.
I tried to implement a generator function to overcome those limitations anyway,
I added also a prompt_context, prefix and suffix to condition the generation.
you can get the tokens now one by one and do whatever you want with them or just stop whenever you want. The stop word (aka antiprompt) is added as well
you can take a look how I did it here
Hey there, I'm using this for an API endpoint and have come up against some issues I can't solve
With Vicuna and Vicuna 1.1 the stop token changed from
###
to</s>
but there appears to be no way to tell pyllamacpp what the stop token is. With the v0 model, it continues generating non stop, outputting prompts for the human. (probably a separate issue: With 1.1 it appears broken altogether and throws tensor errors outputting gibberish to console)I thought of implementing a stop token detection inside the
new_text_callback
fn and just calling exit(), which actually does what I want, but before I got that far I was trying to get the model to stop parroting the prompt back as part of the output, which oddly it seems to take computational power to do token by token. Any idea why this is happening? Can I get it to stop parroting the prompt?That leads to my third question - should I supply conversation history / system prompts as part of
prompt
or is there a different way to pass that to the generate() call?I realise I could partially solve / hack around these limitations by not attempting to stream back to the front end (and stripping the prompt from the output with some string manipulation) but that is kind of a bummer, I'm hoping the functionality of this could be improved to allow for that?
I tried interactive mode but that appears to be a blocking process which talks directly to the console, meaning it won't be usable for a web endpoint implementation
If any of this is user error please let me know, I've just done the best I could with the doco available
The text was updated successfully, but these errors were encountered: