Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add streaming tool use to llama-cpp-python #71

Merged
merged 6 commits into from
Jan 5, 2025
Merged

Conversation

lsorber
Copy link
Member

@lsorber lsorber commented Dec 25, 2024

Changes:

  1. ✨ Enable streaming tool use for llama-cpp-python models (see below for details). The result is a more simple rag and async_rag implementation that opens the door to Agentic RAG and user-defined tools.
  2. ✨ Update the query parameter description to require that it is a single-faceted question (i.e., a non-compound question) to encourage parallel function calling for compound questions.
  3. ✅ Add fairly thorough tests for both rag and the improved chatml-function-calling chat handler.

Changes to llama-cpp-python's chatml-function-calling chat handler:

  1. General:
    a. ✨ If no system message is supplied, add an empty system message to hold the tool metadata.
    b. ✨ Add function descriptions to the system message so that tool use is better informed (fixes chatml-function-callling not adding tool description to the prompt. abetlen/llama-cpp-python#1869).
    c. ✨ Replace print statements relating to JSON grammars with RuntimeWarning warnings.
    d. ✅ Add tests with fairly broad coverage of the different scenarios.
  2. Case "Tool choice by user":
    a. ✨ Add support for more than one function call by making this a special case of "Automatic tool choice" with a single tool (subsumes Support parallel function calls with tool_choice abetlen/llama-cpp-python#1503).
  3. Case "Automatic tool choice -> respond with a message":
    a. ✨ Use user-defined stop and max_tokens.
    b. 🐛 Replace incorrect use of follow-up grammar with user-defined grammar.
  4. Case "Automatic tool choice -> one or more function calls":
    a. ✨ Add support for streaming the function calls (fixes Feature request: add support for streaming tool use abetlen/llama-cpp-python#1883).
    b. ✨ Make tool calling more robust by giving the LLM an explicit way to terminate the tool calls by wrapping them in a <function_calls></function_calls> block.
    c. 🐛 Add missing ":" stop token to determine whether to continue with another tool call, which prevented parallel function calling (fixes chatml-function-calling chat format fails to generate multi calls to the same tool abetlen/llama-cpp-python#1756).
    d. ✨ Set temperature=0 to determine whether to continue with another tool call, similar to the initial decision on whether to call a tool.

@lsorber lsorber requested a review from undo76 December 25, 2024 22:34
@lsorber lsorber self-assigned this Dec 25, 2024
@lsorber
Copy link
Member Author

lsorber commented Dec 26, 2024

Upstream PR to bring these improvements to llama-cpp-python: abetlen/llama-cpp-python#1884

@lsorber lsorber merged commit c57aac1 into main Jan 5, 2025
2 checks passed
@lsorber lsorber deleted the ls-streaming-tools branch January 5, 2025 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant