You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ ./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 6121 (e54d41b)
built with cc (GCC) 14.3.1 20250523 (Red Hat 14.3.1-1) for x86_64-redhat-linux
Sending the simple prompt "What is bugonia?" to the 20b model on gpt-oss.com gives a perfect response.
With llama-cli it tries to reason an answer but never comes close the the correct answer from gpt-oss.com.
Neither of these invocations give an acceptable anser:
$ ./llama.cpp/llama-cli -hf unsloth/gpt-oss-20b-GGUF:F16 --jinja -ngl 99 --threads -1 --ctx-size 16384 --temp 1.0 --top-p 1.0 --top-k 0