-
Notifications
You must be signed in to change notification settings - Fork 674
Ability to use local LLM(LM Studio or Ollama) #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @eliliam, |
So I asked an LLM what to put in here and it came back with this (which works beautifully):
Also had to add the following to requirements.txt:
|
@sitestudio |
FYI: the following simple update works for me, for both ollama and grok as tested. |
Are there any plans to merge the Ollama into the code base? |
Check out the code snippet provided by @xiongyw for ollama support: |
Ollama also allows to work with DeepSeek, I don't know if it supports Gemini and OpenAI and other closed ones, but if so, we could replace |
@zachary62 the code provided by @xiongyw looks good, could we have that made into a PR to get merged into master? |
Yes! Could you make this code commented out by default, and say something like "uncomment it for ollama". Thank you! |
I cherry-picked the commit an implemented a simple switch in #50, let me know if you prefer some changes to it. |
What's the best model to use if using for a |
Try the brand new Qwen3:8b (or bigger if your hardware can handle it) - switched to it yesterday and was getting much better results than with anything else. I was only generating python and have limited hardware atm, but was very impressed with the step up in it's "cognition". |
@sitestudio I tried using
Looks like the response is not in expected format and differs across models |
@gethari my experience has been similar now that I actually tried to compile the python code that comes from qwen3-8b - I suspect it is a lack of "RAM/VRAM" locally. Waiting to a get my M1 Macbook repaired to see if that helps or finding some bigger hardware. My other issue is that if I run in Plan mode in Cline then I am unable to respond to the first output - an error message about tool_name and setting up a Modelfile with settings like PARAMETER stop "</attempt_completion>" however my research also suggests that this issue can be related to various things such as the version of Ollama, the individual model or the amount of RAM/VRAM I have. So for now I have resorted to running in Plan mode, including answers in subsequent Plan mode prompts and then running an individual Act mode prompt and then using that code to move forward for now. |
How to do this @sitestudio |
@gethari I am using Cline and at the bottom right corner of the Extension window (just below where your prompt would go) you can toggle between Plan and Act mode. |
@gethari i had the issue with this too.
|
This issue is caused because the model is not capable enough to output yaml string. Please use a more capable model. |
Here is the call_llm.py script i modified with chatgpt to work using lm studio. gemma-3-27b-it gives me a an error so another model most probably is needed.
|
i managed to make it work with |
This is such an amazing package, but with some larger codebases that I work with the costs would just be too high to run this using cloud models. How difficult would it be to support locally running LLM models through the likes of LM Studio or Ollama? I know both provide OpenAI compatible APIs as well as a suite of other ways to interact with the locally running model. This feature would be killer and would set this out as a tool similar to Claude Code for local codebase analysis.
The text was updated successfully, but these errors were encountered: