[Proposal] "Stable" C API #177

Ronsor · 2023-03-15T18:01:09Z

Ronsor
Mar 15, 2023

I propose refactoring main.cpp into a library (llama.cpp, compiled to llama.so/llama.a/whatever) and making main.cpp a simple driver program. A simple C API should be exposed to access the model, and then bindings can more easily be written for Python, node.js, or whatever other language.

This would partially solve #82 and #162.

Edit: on that note, is it possible to do inference from two or more prompts on different threads? If so, serving multiple people would be possible without multiple copies of model weights in RAM.

bakkot · 2023-03-15T19:19:28Z

bakkot
Mar 15, 2023

For anyone wanting to do this, see an initial attempt in #77, and in particular this comment on ggerganov's preferred approach. Should be pretty straightforward I think.

0 replies

v3ss0n · 2023-03-15T19:29:22Z

v3ss0n
Mar 15, 2023

It is already ongoing , check PR #77

0 replies

ggerganov · 2023-03-15T19:55:57Z

ggerganov
Mar 15, 2023
Maintainer

Yes, see the comment #77 (review) as @bakkot suggested. This is the way 🦙

0 replies

v3ss0n · 2023-03-15T20:22:43Z

v3ss0n
Mar 15, 2023

ah @bakkot beat me to it while i was writing. @Ronsor please close this , and the project have Discussion now https://github.com/ggerganov/llama.cpp/discussions

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] "Stable" C API #177

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Proposal] "Stable" C API #177

Ronsor Mar 15, 2023

Replies: 5 comments

bakkot Mar 15, 2023

v3ss0n Mar 15, 2023

ggerganov Mar 15, 2023 Maintainer

v3ss0n Mar 15, 2023

Ronsor
Mar 15, 2023

bakkot
Mar 15, 2023

v3ss0n
Mar 15, 2023

ggerganov
Mar 15, 2023
Maintainer

v3ss0n
Mar 15, 2023