Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running without Openapi / vLLM #59

Closed
sashokbg opened this issue Nov 22, 2023 · 17 comments
Closed

Running without Openapi / vLLM #59

sashokbg opened this issue Nov 22, 2023 · 17 comments
Assignees

Comments

@sashokbg
Copy link

Hello, due to GPU constraints I am trying to run the model using a C++ implementation - https://github.com/ggerganov/llama.cpp

This involves converting and quantizing to 4 bits as well as using the runner from the llama.cpp.

What would be a properly created prompt in a "classic running" ?

I tried just passing the functions and messages in the prompt like so but it did not work (inference was not accurate):

messages=[{"role": "user", "content": "What is the weather for Istanbul?"}],

functions=[{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
},
"required": ["location"],
},
}]

@khai-meetkai
Copy link
Collaborator

Hi sashokbg,
We have just updated the instruction for using Llama_Cpp with GGUF files, you can take a look at the instruction from here: https://github.com/MeetKai/functionary#llama_cpp-inference-gguf-files

@sashokbg
Copy link
Author

Hello @khai-meetkai thank you very much for the detailed explanation. I will try it as soon as I can at home and will get back to you and close this ticket :).

@sashokbg
Copy link
Author

Hello @khai-meetkai I have just tested the model with the tutorial you have provided and it works very well !
I am very new to LLMs so I cannot tell just how good it performs compared to other models, but it feels very promising !

Especially given that we can run this on a local machine and play around as much as we want for free :)

@rgbkrk
Copy link
Contributor

rgbkrk commented Nov 29, 2023

Woo, I was able to get this working on Apple's Metal Performance Shaders and with Chatlab's function registry. I'm using functionary-7b-v1.4.f16.gguf and it's working pretty well.

@khai-meetkai
Copy link
Collaborator

khai-meetkai commented Nov 30, 2023

@rgbkrk we are training a new functionary model with the ability to call multiple functions in parallel; it is similar to OpenAI parallel function call. Hope that Chatlab will support this soon ?

@rgbkrk
Copy link
Contributor

rgbkrk commented Nov 30, 2023

I'll make sure to support it soon for your coming launch. Same format as OpenAIs I assume?

@khai-meetkai
Copy link
Collaborator

@rgbkrk, Yes we manage to have the same format for both streaming and non-streaming

@rgbkrk
Copy link
Contributor

rgbkrk commented Nov 30, 2023

Tracking the work in rgbkrk/chatlab#118

Just to wrap up / showcase the incredible power here, I made a little video

local-model-smaller-for-real

I've posted the same to twitter as well: https://twitter.com/KyleRayKelley/status/1730296106695979273

@ChristianWeyer
Copy link

Woo, I was able to get this working on Apple's Metal Performance Shaders and with Chatlab's function registry. I'm using functionary-7b-v1.4.f16.gguf and it's working pretty well.

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?
Thanks!

@rgbkrk
Copy link
Contributor

rgbkrk commented Jan 16, 2024

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?

I ran it using llama.cpp. Sadly I did not bring that code over to my new machine so I'll have to make a new version. I need to get this working with the v2 model anyhow.

@ChristianWeyer
Copy link

What exactly did you do to get it running with MPS on your ARM M1/M2 processor?

I ran it using llama.cpp. Sadly I did not bring that code over to my new machine so I'll have to make a new version. I need to get this working with the v2 model anyhow.

Would be great to have some instructions here. Thanks! 🙂

@khai-meetkai
Copy link
Collaborator

@ChristianWeyer
To use llama.cpp on Apple Macbook M1 or M2, you first install llama-cpp-python based on this instruction: https://llama-cpp-python.readthedocs.io/en/latest/install/macos/
Once you finish installing, please follow the instruction from here:
https://github.com/MeetKai/functionary?tab=readme-ov-file#llama_cpp-inference

@rgbkrk
Copy link
Contributor

rgbkrk commented Jan 19, 2024

Would be great to have some instructions here. Thanks! 🙂

If you grab the branch for #93 all you have to run is python example_llama_cpp.py. That's not portable outside of the repo until #95 comes in and a functionary package is released.

@sandangel
Copy link

@rgbkrk @khai-meetkai , May I ask is it possible to use mlx https://github.com/ml-explore/mlx-examples on mac with functionary similar to llama-cpp? I'm looking to build openai compatible server for mlx or using mlx inference directly. Any thought on this?

@rgbkrk
Copy link
Contributor

rgbkrk commented Feb 1, 2024

@sandangel I'm sure there's a way. Functionary requires additional steps for inference because of the function & tool calling so you'd have to port some of what's in this repo over to mlx usage.

@sandangel
Copy link

@rgbkrk thanks a lot for your comment. Could you help point me to where I should start with? I really appreciate it. 😀

@rgbkrk
Copy link
Contributor

rgbkrk commented Feb 2, 2024

Start by looking at how get_prompt_template_from_tokenizer works like is used in https://github.com/MeetKai/functionary/blob/main/example_llama_cpp.py#L17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants