Skip to content
This repository has been archived by the owner on May 12, 2023. It is now read-only.

stop token and prompt input issues #63

Open
Energiz3r opened this issue Apr 17, 2023 · 3 comments
Open

stop token and prompt input issues #63

Energiz3r opened this issue Apr 17, 2023 · 3 comments

Comments

@Energiz3r
Copy link

Energiz3r commented Apr 17, 2023

Hey there, I'm using this for an API endpoint and have come up against some issues I can't solve

With Vicuna and Vicuna 1.1 the stop token changed from ### to </s> but there appears to be no way to tell pyllamacpp what the stop token is. With the v0 model, it continues generating non stop, outputting prompts for the human. (probably a separate issue: With 1.1 it appears broken altogether and throws tensor errors outputting gibberish to console)

I thought of implementing a stop token detection inside the new_text_callback fn and just calling exit(), which actually does what I want, but before I got that far I was trying to get the model to stop parroting the prompt back as part of the output, which oddly it seems to take computational power to do token by token. Any idea why this is happening? Can I get it to stop parroting the prompt?

That leads to my third question - should I supply conversation history / system prompts as part of prompt or is there a different way to pass that to the generate() call?

I realise I could partially solve / hack around these limitations by not attempting to stream back to the front end (and stripping the prompt from the output with some string manipulation) but that is kind of a bummer, I'm hoping the functionality of this could be improved to allow for that?

I tried interactive mode but that appears to be a blocking process which talks directly to the console, meaning it won't be usable for a web endpoint implementation

If any of this is user error please let me know, I've just done the best I could with the doco available

@flamby
Copy link

flamby commented Apr 26, 2023

Hi,

Am I right assuming Vicuna is not yet supported?

Thanks

@absadiki
Copy link
Collaborator

absadiki commented May 2, 2023

Hi @Energiz3r,

The old generate function was happening in the c++ side, that's why it was blocking the thread. You could've just use a separate thread for the model generation to solve the issue.

I tried to implement a generator function to overcome those limitations anyway,
I added also a prompt_context, prefix and suffix to condition the generation.
you can get the tokens now one by one and do whatever you want with them or just stop whenever you want. The stop word (aka antiprompt) is added as well
you can take a look how I did it here

Let me know if you still have an issue.

@absadiki
Copy link
Collaborator

absadiki commented May 2, 2023

Hi,

Am I right assuming Vicuna is not yet supported?

Thanks

Hi @flamby,

Vicuna should be supported as well basically.
Have you tried it and found any issues ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants