fix: revert llama cpp python server to 0.2.79 to enable gpu #44
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
It just reverts the llama cpp python server bc the 0.2.79 is the last version that actually works fine with vulkan
Screenshot / video of UI
N/A
What issues does this PR fix or reference?
it resolves #40
How to test this PR?
ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat-vulkan:62b6f628ed77cf3f1518c32746e2e89d27072f0e
and verify that it actually uses the cpu. The gpu detection is completely skipped.You can use this command (update the model path)
In the logs you should just have
2-b. if you do not want to build your own images you can use these below for testing using different version of llama_cpp
quay.io/lstocchi/vulkan:v4_279
-> llama_cpp 0.2.79quay.io/lstocchi/vulkan:v4_280
-> llama_cpp 0.2.80quay.io/lstocchi/vulkan:v4_284
-> llama_cpp 0.2.84ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat-vulkan:62b6f628ed77cf3f1518c32746e2e89d27072f0e
-> llamacpp 0.2.85quay.io/lstocchi/vulkan:v4_287
-> llama_cpp 0.2.87