Name		Name	Last commit message	Last commit date
parent directory ..
src		src
Cargo.toml		Cargo.toml
README.md		README.md
server.sh		server.sh

README.md

rLLM for llama.cpp

This is similar to the CUDA-based rLLM but built on top of llama.cpp.

Building

If you're not using the supplied docker container follow the build setup instructions.

To compile and run first aicirt and then the rllm server, run:

./server.sh phi2

Run ./server.sh --help for more options.

You can also try passing --cuda before phi2, which will enable cuBLASS in llama.cpp. Note that this is different from rllm-cuda, which may give you better performance when doing batched inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rllm-llamacpp

rllm-llamacpp

README.md

rLLM for llama.cpp

Building

Files

rllm-llamacpp

Directory actions

More options

Directory actions

More options

Latest commit

History

rllm-llamacpp

Folders and files

parent directory

README.md

rLLM for llama.cpp

Building