-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama logits processor #556
Conversation
I think we should use a hybrid approach where we subclass |
Sounds good. I'm going to move the logit processor into the integration file. |
1cf20db
to
a8d24fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we should have an abstract implementation of RegexLogitsProcessor
, CFGLogitsProcessor
, and JSONLogitsProcessor
which handles the state management logic, then subclass for specific interfacing with the inference library.
Jip that's a good idea... |
ff5abb8
to
adeced3
Compare
I updated the code in PR to give you a better idea of the interface I'm thinking about. There are still several things to think about / do:
|
adeced3
to
b20d502
Compare
b20d502
to
57d6b03
Compare
|
We can forget about multi-prompt streaming in the first iteration, raise an informative |
Hi @dtiarks, thanks for implementing this! Could you let me know what work remains and if there's any way I could help? |
This is essentially ready for reviewing. I talked with @rlouf in private about it assuming he would review it. |
I am going to as soon as I have time! |
1cd7fde
to
b31d731
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently simplifying a few things in the code. Please wait until I'm done to make changes.
examples/llamacpp_example.py
Outdated
@@ -30,8 +30,7 @@ class Character(BaseModel): | |||
|
|||
|
|||
if __name__ == "__main__": | |||
# Download model from https://huggingface.co/TheBloke/phi-2-GGUF | |||
model = outlines.models.llamacpp("./phi-2.Q3_K_M.gguf", device="cpu") | |||
model = outlines.models.llamacpp("./my_model.gguf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better in the documentation
48abe80
to
befd327
Compare
I simplified the code in this PR, among other things removing all the tokenizer code that is not needed anymore. Since the code in |
befd327
to
5cadc1e
Compare
All the processor I looked at so far (transformers, vLLM, TensorRT-LLM etc.) have a similar interface but not exactly the same. So it makes sense to have a model specific implementation for each. |
@@ -3,6 +3,7 @@ | |||
from outlines.fsm.fsm import RegexFSM | |||
from outlines.generate.api import SequenceGenerator | |||
from outlines.models import OpenAI | |||
from outlines.models.llamacpp import LlamaCpp, RegexLogitsProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should import within def regex_llamacpp
, otherwise using outlines.generate.regex
and outlines.generate.json
will require installing llama_cpp_python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We register on the LlamaCpp
type which is a subclass of Llama
, so there's clearly a design issue here
a7e14e0
to
78acab4
Compare
I rebased the branch on We still need to move a few things around, as noted @lapp0 the current code makes it necessary to install |
11ab047
to
726ec24
Compare
I found a workaround, can someone review before we merge? |
) | ||
|
||
logits_processor = RegexLogitsProcessor(regex_str, model.tokenizer) | ||
model.logits_processor = logits_processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to bind the logits processor to the model? Won't this have side effects for other generations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a new generation would bind a different logit processor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for outlines.generate.text
. Perhaps we can just delete the attribute if it exists in outlines.generate.text
to get it out the door, and I can address the design in a separate PR, refactoring with the use of abstract logits processor that have cleaner logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine for now given how __call__
works, we may revisit this in the future though.
outlines/models/llamacpp.py
Outdated
for prompt in prompts: | ||
processors = [] | ||
if self.logits_processor is not None: | ||
processors = [copy.copy(self.logits_processor)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the logits processors share a CFGFSM
if we shallow copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, logits processors should have a copy
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably open an issue for this.
|
||
formatted = [self.format_sequence(sequence) for sequence in results] | ||
|
||
return formatted if len(formatted) > 1 else formatted[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
IMO, List[prompt]
should always correspond to List[generation]
. I think we should condition on the passed prompts
type rather than the length of formatted
.
726ec24
to
9df92df
Compare
@dtiarks You can take a last look |
1e220e1
to
743b054
Compare
I added a |
743b054
to
1079903
Compare
Merging, great job everyone! |
this is introduces since version 0.32 with this pull request dottxt-ai/outlines#556 It changes the function name from `build_regex_from_object` to `build_regex_from_schema` This leads to an error in newer docker containers when starting tgi.
Note that this is super early.
We should probably provide a core implementation that all the other processors could inherit. Also the implementation of the tokenizer seems to be redundant.
Looking forward to some input.