Running without Openapi / vLLM #59

sashokbg · 2023-11-22T21:40:23Z

Hello, due to GPU constraints I am trying to run the model using a C++ implementation - https://github.com/ggerganov/llama.cpp

This involves converting and quantizing to 4 bits as well as using the runner from the llama.cpp.

What would be a properly created prompt in a "classic running" ?

I tried just passing the functions and messages in the prompt like so but it did not work (inference was not accurate):

messages=[{"role": "user", "content": "What is the weather for Istanbul?"}],

functions=[{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
},
"required": ["location"],
},
}]

khai-meetkai · 2023-11-28T06:10:37Z

Hi sashokbg,
We have just updated the instruction for using Llama_Cpp with GGUF files, you can take a look at the instruction from here: https://github.com/MeetKai/functionary#llama_cpp-inference-gguf-files

sashokbg · 2023-11-28T13:26:14Z

Hello @khai-meetkai thank you very much for the detailed explanation. I will try it as soon as I can at home and will get back to you and close this ticket :).

sashokbg · 2023-11-28T20:43:18Z

Hello @khai-meetkai I have just tested the model with the tutorial you have provided and it works very well !
I am very new to LLMs so I cannot tell just how good it performs compared to other models, but it feels very promising !

Especially given that we can run this on a local machine and play around as much as we want for free :)

rgbkrk · 2023-11-29T21:49:36Z

Woo, I was able to get this working on Apple's Metal Performance Shaders and with Chatlab's function registry. I'm using functionary-7b-v1.4.f16.gguf and it's working pretty well.

khai-meetkai · 2023-11-30T01:33:53Z

@rgbkrk we are training a new functionary model with the ability to call multiple functions in parallel; it is similar to OpenAI parallel function call. Hope that Chatlab will support this soon ?

rgbkrk · 2023-11-30T04:08:57Z

I'll make sure to support it soon for your coming launch. Same format as OpenAIs I assume?

khai-meetkai · 2023-11-30T04:24:42Z

@rgbkrk, Yes we manage to have the same format for both streaming and non-streaming

rgbkrk · 2023-11-30T18:55:41Z

Tracking the work in rgbkrk/chatlab#118

Just to wrap up / showcase the incredible power here, I made a little video

I've posted the same to twitter as well: https://twitter.com/KyleRayKelley/status/1730296106695979273

ChristianWeyer · 2024-01-15T19:06:39Z

Woo, I was able to get this working on Apple's Metal Performance Shaders and with Chatlab's function registry. I'm using functionary-7b-v1.4.f16.gguf and it's working pretty well.

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?
Thanks!

rgbkrk · 2024-01-16T20:30:02Z

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?

I ran it using llama.cpp. Sadly I did not bring that code over to my new machine so I'll have to make a new version. I need to get this working with the v2 model anyhow.

ChristianWeyer · 2024-01-19T15:22:02Z

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?

I ran it using llama.cpp. Sadly I did not bring that code over to my new machine so I'll have to make a new version. I need to get this working with the v2 model anyhow.

Would be great to have some instructions here. Thanks! 🙂

khai-meetkai · 2024-01-19T17:08:01Z

@ChristianWeyer
To use llama.cpp on Apple Macbook M1 or M2, you first install llama-cpp-python based on this instruction: https://llama-cpp-python.readthedocs.io/en/latest/install/macos/
Once you finish installing, please follow the instruction from here:
https://github.com/MeetKai/functionary?tab=readme-ov-file#llama_cpp-inference

rgbkrk · 2024-01-19T20:40:46Z

Would be great to have some instructions here. Thanks! 🙂

If you grab the branch for #93 all you have to run is python example_llama_cpp.py. That's not portable outside of the repo until #95 comes in and a functionary package is released.

sandangel · 2024-02-01T14:19:00Z

@rgbkrk @khai-meetkai , May I ask is it possible to use mlx https://github.com/ml-explore/mlx-examples on mac with functionary similar to llama-cpp? I'm looking to build openai compatible server for mlx or using mlx inference directly. Any thought on this?

rgbkrk · 2024-02-01T16:42:23Z

@sandangel I'm sure there's a way. Functionary requires additional steps for inference because of the function & tool calling so you'd have to port some of what's in this repo over to mlx usage.

sandangel · 2024-02-02T00:10:24Z

@rgbkrk thanks a lot for your comment. Could you help point me to where I should start with? I really appreciate it. 😀

rgbkrk · 2024-02-02T17:22:34Z

Start by looking at how get_prompt_template_from_tokenizer works like is used in https://github.com/MeetKai/functionary/blob/main/example_llama_cpp.py#L17

jeffreymeetkai assigned jeffreymeetkai and khai-meetkai and unassigned jeffreymeetkai Nov 28, 2023

sashokbg closed this as completed Nov 28, 2023

rgbkrk mentioned this issue Nov 30, 2023

Support multiple function calling rgbkrk/chatlab#118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running without Openapi / vLLM #59

Running without Openapi / vLLM #59

sashokbg commented Nov 22, 2023

khai-meetkai commented Nov 28, 2023

sashokbg commented Nov 28, 2023

sashokbg commented Nov 28, 2023

rgbkrk commented Nov 29, 2023

khai-meetkai commented Nov 30, 2023 •

edited

Loading

rgbkrk commented Nov 30, 2023

khai-meetkai commented Nov 30, 2023

rgbkrk commented Nov 30, 2023

ChristianWeyer commented Jan 15, 2024

rgbkrk commented Jan 16, 2024 •

edited

Loading

ChristianWeyer commented Jan 19, 2024

khai-meetkai commented Jan 19, 2024

rgbkrk commented Jan 19, 2024 •

edited

Loading

sandangel commented Feb 1, 2024

rgbkrk commented Feb 1, 2024

sandangel commented Feb 2, 2024

rgbkrk commented Feb 2, 2024

Running without Openapi / vLLM #59

Running without Openapi / vLLM #59

Comments

sashokbg commented Nov 22, 2023

khai-meetkai commented Nov 28, 2023

sashokbg commented Nov 28, 2023

sashokbg commented Nov 28, 2023

rgbkrk commented Nov 29, 2023

khai-meetkai commented Nov 30, 2023 • edited Loading

rgbkrk commented Nov 30, 2023

khai-meetkai commented Nov 30, 2023

rgbkrk commented Nov 30, 2023

ChristianWeyer commented Jan 15, 2024

rgbkrk commented Jan 16, 2024 • edited Loading

ChristianWeyer commented Jan 19, 2024

khai-meetkai commented Jan 19, 2024

rgbkrk commented Jan 19, 2024 • edited Loading

sandangel commented Feb 1, 2024

rgbkrk commented Feb 1, 2024

sandangel commented Feb 2, 2024

rgbkrk commented Feb 2, 2024

khai-meetkai commented Nov 30, 2023 •

edited

Loading

rgbkrk commented Jan 16, 2024 •

edited

Loading

rgbkrk commented Jan 19, 2024 •

edited

Loading