-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out of RAM #209
Comments
Show me your ellama configuration |
Also you can try to enable flash attention and cache quantization: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-enable-flash-attention |
in your configuration I want to see |
Also, you can mark region before calling |
(use-package ellama
:demand t
:bind ("C-c m" . ellama-transient-main-menu)
:init
;; setup key bindings
(setq ellama-keymap-prefix "C-c e")
;; language you want ellama to translate to
(setq ellama-language "German")
;; could be llm-openai for example
(require 'llm-ollama)
(setq ellama-provider
(make-llm-ollama
;; this model should be pulled to use it
;; value should be the same as you print in terminal during pull
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params '(("num_ctx" . 8192))))
(setq ellama-summarization-provider
(make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params '(("num_ctx" . 32768))))
(setq ellama-coding-provider
(make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params '(("num_ctx" . 32768))))
;; Predefined llm providers for interactive switching.
;; You shouldn't add ollama providers here - it can be selected interactively
;; without it. It is just example.
(setq ellama-providers
'(("deepseek-r1:14b" . (make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "qwen2.5-coder"))
("deepseek-coder-v2" . (make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"))
("qwen2.5-coder" . (make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"))))
;; Naming new sessions with llm
(setq ellama-naming-provider
(make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params '(("stop" . ("\n")))))
(setq ellama-naming-scheme 'ellama-generate-name-by-llm)
;; Translation llm provider
(setq ellama-translation-provider
(make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params
'(("num_ctx" . 32768))))
(setq ellama-extraction-provider (make-llm-ollama
:chat-model "qwen2.5-coder"
:embedding-model "nomic-embed-text"
:default-chat-non-standard-params
'(("num_ctx" . 32768))))
;; customize display buffer behaviour
;; see ~(info "(elisp) Buffer Display Action Functions")~
(setq ellama-chat-display-action-function #'display-buffer-full-frame)
(setq ellama-instant-display-action-function #'display-buffer-at-bottom)
:config
;; send last message in chat buffer with C-c C-c
(add-hook 'org-ctrl-c-ctrl-c-hook #'ellama-chat-send-last-message)) I set the flag but nothing chages - same result when buffer sent and if code is select.As you can see num_ctxs are the default values (the one provided here). The deepseek-codeer is not so big and I use it with no problem in terminal ollama run mode.I have mention that after setting the lash attention and cache quantization flag with qwen2.5-coder code suggestion works. |
Your configuration contains only "qwen2.5-coder" (see |
I use ollama in terminal with deepseek-r1:14b even 32b (very slow) in this linux zenbook pro duo UX851G. When ask chat with deepseek-coder-v2 or qwen2.5-coder it goes OK. If I try code review or suggestions I receive this error:
Debugger entered--Lisp error: (error "Error calling the LLM: model requires more system ...")
error("Error calling the LLM: %s" "model requires more system memory (49.0 GiB) than ...")
#("model requires more system memory (49.0 GiB) than ...")
#f(compiled-function (_ msg) #<bytecode 0x1dd94b5c526701a6>)(error "model requires more system memory (49.0 GiB) than ...")
#f(compiled-function (type err) #<bytecode 0x1c89dbf61d0fce3b>)(error "model requires more system memory (49.0 GiB) than ...")
llm-provider-utils-callback-in-buffer(# #f(compiled-function (type err) #<bytecode 0x1c89dbf61d0fce3b>) error "model requires more system memory (49.0 GiB) than ...")
#f(compiled-function (_ data) #<bytecode -0xe7e2c5f10e42101>)(error ((error . "model requires more system memory (49.0 GiB) than ...")))
llm-request-plz--handle-error(#s(plz-error :curl-error nil :response #s(plz-response :version 1.1 :status 500 :headers ((content-type . "application/json; charset=utf-8") (date . "Tue, 11 Feb 2025 12:32:35 GMT") (content-length . "85$
#f(compiled-function (error) #<bytecode -0x8cecea1549f9e66>)(#s(plz-error :curl-error nil :response #s(plz-response :version 1.1 :status 500 :headers ((content-type . "application/json; charset=utf-8") (date . "Tue, 11 Feb 2025 12:32$
#f(compiled-function (error) #<bytecode 0x14bca7baea1431be>)(#s(plz-error :curl-error nil :response #s(plz-response :version 1.1 :status 500 :headers ((content-type . "application/json; charset=utf-8") (date . "Tue, 11 Feb 2025 12:32$
plz--respond(# #<buffer plz-request-curl> "finished\n")
apply(plz--respond (# #<buffer plz-request-curl> "finished\n"))
timer-event-handler([t 26539 17251 109626 nil plz--respond (# #<buffer plz-request-curl> "finished\n") nil 478000 nil])
Is there a configuration I can make so to use models that runs in my terminal?
The text was updated successfully, but these errors were encountered: