Ability to use local LLM(LM Studio or Ollama) #13

eliliam · 2025-04-14T17:32:45Z

This is such an amazing package, but with some larger codebases that I work with the costs would just be too high to run this using cloud models. How difficult would it be to support locally running LLM models through the likes of LM Studio or Ollama? I know both provide OpenAI compatible APIs as well as a suite of other ways to interact with the locally running model. This feature would be killer and would set this out as a tool similar to Claude Code for local codebase analysis.

zachary62 · 2025-04-14T19:30:48Z

Hi @eliliam,
The tool needs a call_llm function that takes a string as input and outputs a string: https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/blob/main/utils/call_llm.py
So you can simply replace the function with an implementation based on LM Studio or Ollama: https://the-pocket.github.io/PocketFlow/utility_function/llm.html
Let me know if this makes sense. Thanks!

sitestudio · 2025-04-15T17:36:01Z

So I asked an LLM what to put in here and it came back with this (which works beautifully):

def call_llm(prompt, use_cache: bool = True):
    """
    Calls an Ollama model to generate a text response.

    Args:
        prompt (str): The prompt to send to the model.
        use_cache (bool, optional): Whether to use Ollama's caching mechanism. Defaults to True.

    Returns:
        str: The generated text response from the model.
    """
    try:
        response = ollama.chat(
            model='cogito:14b',  # deepcoder:14b  gemma3:12b  phi4:14b Replace with your desired Ollama model name
            messages=[
                {
                    'role': 'user',
                    'content': prompt,
                },
            ],
            stream=False, # important to set stream to false to get the response.
            options = {
                'use_cache': use_cache,
            }

        )
        return response['message']['content']
    except ollama.ResponseError as e:
        print(f"Ollama Error: {e}")
        return None  # Or handle the error as needed.

Also had to add the following to requirements.txt:

ollama >=0.4.7

zachary62 · 2025-04-15T17:58:28Z

@sitestudio
Amazing! One minor change I would suggest is to remove the except: return None
It may generate, e.g., a tutorial with empty chapters. Just let it fail, as pocket flow has native node retry.

xiongyw · 2025-04-16T08:55:07Z

This is such an amazing package, but with some larger codebases that I work with the costs would just be too high to run this using cloud models. How difficult would it be to support locally running LLM models through the likes of LM Studio or Ollama? I know both provide OpenAI compatible APIs as well as a suite of other ways to interact with the locally running model. This feature would be killer and would set this out as a tool similar to Claude Code for local codebase analysis.

FYI: the following simple update works for me, for both ollama and grok as tested. call_llm() for ollama and grok

FatCache · 2025-04-21T01:08:26Z

Are there any plans to merge the Ollama into the code base?

zachary62 · 2025-04-21T17:06:13Z

Are there any plans to merge the Ollama into the code base?

Check out the code snippet provided by @xiongyw for ollama support: call_llm() for ollama and grok

piranna · 2025-04-23T15:12:17Z

Ollama also allows to work with DeepSeek, I don't know if it supports Gemini and OpenAI and other closed ones, but if so, we could replace call_llm() function for a single one using Ollama, delegating the actual model to be used to it :-)

eliliam · 2025-04-23T21:27:32Z

@zachary62 the code provided by @xiongyw looks good, could we have that made into a PR to get merged into master?

zachary62 · 2025-04-23T21:32:57Z

@zachary62 the code provided by @xiongyw looks good, could we have that made into a PR to get merged into master?

Yes! Could you make this code commented out by default, and say something like "uncomment it for ollama". Thank you!

Le09 · 2025-04-24T12:30:08Z

I cherry-picked the commit an implemented a simple switch in #50, let me know if you prefer some changes to it.

gethari · 2025-04-29T11:06:43Z

What's the best model to use if using for a React+TS codebase ? I tried the generation with llama3.2 and the output generated is just hot garbage, none of the content generated is relevant to the codebase I pointed at.

sitestudio · 2025-04-29T17:14:39Z

Try the brand new Qwen3:8b (or bigger if your hardware can handle it) - switched to it yesterday and was getting much better results than with anything else. I was only generating python and have limited hardware atm, but was very impressed with the step up in it's "cognition".

gethari · 2025-04-30T04:50:46Z

@sitestudio I tried using codellama:13b, the script seems to be failing randomly, I tried copilot to fix this, but not a python expert so, I could'nt use this model.

 yaml_str = response.strip().split("```yaml")[1].split("```")[0].strip()
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Looks like the response is not in expected format and differs across models

sitestudio · 2025-04-30T04:59:51Z

@gethari my experience has been similar now that I actually tried to compile the python code that comes from qwen3-8b - I suspect it is a lack of "RAM/VRAM" locally. Waiting to a get my M1 Macbook repaired to see if that helps or finding some bigger hardware.

My other issue is that if I run in Plan mode in Cline then I am unable to respond to the first output - an error message about tool_name and setting up a Modelfile with settings like PARAMETER stop "</attempt_completion>"

however my research also suggests that this issue can be related to various things such as the version of Ollama, the individual model or the amount of RAM/VRAM I have.

So for now I have resorted to running in Plan mode, including answers in subsequent Plan mode prompts and then running an individual Act mode prompt and then using that code to move forward for now.

gethari · 2025-04-30T08:31:56Z

So for now I have resorted to running in Plan mode

How to do this @sitestudio

sitestudio · 2025-04-30T09:52:09Z

@gethari I am using Cline and at the bottom right corner of the Extension window (just below where your prompt would go) you can toggle between Plan and Act mode.

TheHawk3r · 2025-05-02T03:04:05Z

@gethari i had the issue with this too.

yaml_str = response.strip().split("```yaml")[1].split("```")[0].strip()
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

zachary62 · 2025-05-02T03:36:45Z

This issue is caused because the model is not capable enough to output yaml string. Please use a more capable model.
Check out #61

TheHawk3r · 2025-05-02T05:30:06Z

import requests
from termcolor import cprint
import json

# Configurează conexiunea către LM Studio local
local_api_url = "http://127.0.0.1:1234/v1/completions"  # Adresa corectă pentru serverul tău local
api_key = "gemma-3-27b-it"  # Dacă ai nevoie de un API key pentru acces

def call_llm(prompt: str, use_cache: bool = False):
    cprint("[Querying local LLM via LM Studio]", "cyan")
    try:
        # Setează datele pentru cererea POST
        data = {
            "model": "gemma-3-27b-it",  # Modelul specificat în LM Studio
            "prompt": prompt,
            "max_tokens": 100,  # Poți ajusta acest parametru
            "temperature": 0.7,  # Poți ajusta acest parametru
        }

        # Setează antetul pentru cererea POST
        headers = {
            "Authorization": f"Bearer {api_key}",  # Dacă API key-ul este necesar
            "Content-Type": "application/json"
        }

        # Trimite cererea POST către LM Studio local
        response = requests.post(local_api_url, headers=headers, data=json.dumps(data))

        # Verifică dacă cererea a avut succes
        if response.status_code == 200:
            result = response.json()
            response_text = result.get("choices")[0].get("text").strip()  # Extrage răspunsul din JSON
            return response_text
        else:
            cprint(f"[Error calling LM Studio] {response.status_code}: {response.text}", "red")
            return "[Error calling LLM]"

    except Exception as e:
        cprint(f"[Error calling LM Studio] {e}", "red")
        return "[Error calling LLM]"

Here is the call_llm.py script i modified with chatgpt to work using lm studio. gemma-3-27b-it gives me a an error so another model most probably is needed.

ValueError: Missing keys in abstraction item: {'name': 'Logging System', 'description': 'Imagine it as'}

TheHawk3r · 2025-05-02T08:24:39Z

i managed to make it work with deepseek-r1-distill-qwen-14b .

zachary62 mentioned this issue Apr 23, 2025

DeepSeek #43

Open

zachary62 mentioned this issue Apr 25, 2025

Add openrouter support #51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to use local LLM(LM Studio or Ollama) #13

Ability to use local LLM(LM Studio or Ollama) #13

eliliam commented Apr 14, 2025

zachary62 commented Apr 14, 2025 •

edited

Loading

sitestudio commented Apr 15, 2025

zachary62 commented Apr 15, 2025

xiongyw commented Apr 16, 2025

FatCache commented Apr 21, 2025

zachary62 commented Apr 21, 2025

piranna commented Apr 23, 2025

eliliam commented Apr 23, 2025

zachary62 commented Apr 23, 2025

Le09 commented Apr 24, 2025

gethari commented Apr 29, 2025

sitestudio commented Apr 29, 2025 •

edited

Loading

gethari commented Apr 30, 2025 •

edited

Loading

sitestudio commented Apr 30, 2025

gethari commented Apr 30, 2025

sitestudio commented Apr 30, 2025

TheHawk3r commented May 2, 2025

zachary62 commented May 2, 2025

TheHawk3r commented May 2, 2025

TheHawk3r commented May 2, 2025

Ability to use local LLM(LM Studio or Ollama) #13

Ability to use local LLM(LM Studio or Ollama) #13

Comments

eliliam commented Apr 14, 2025

zachary62 commented Apr 14, 2025 • edited Loading

sitestudio commented Apr 15, 2025

zachary62 commented Apr 15, 2025

xiongyw commented Apr 16, 2025

FatCache commented Apr 21, 2025

zachary62 commented Apr 21, 2025

piranna commented Apr 23, 2025

eliliam commented Apr 23, 2025

zachary62 commented Apr 23, 2025

Le09 commented Apr 24, 2025

gethari commented Apr 29, 2025

sitestudio commented Apr 29, 2025 • edited Loading

gethari commented Apr 30, 2025 • edited Loading

sitestudio commented Apr 30, 2025

gethari commented Apr 30, 2025

sitestudio commented Apr 30, 2025

TheHawk3r commented May 2, 2025

zachary62 commented May 2, 2025

TheHawk3r commented May 2, 2025

TheHawk3r commented May 2, 2025

zachary62 commented Apr 14, 2025 •

edited

Loading

sitestudio commented Apr 29, 2025 •

edited

Loading

gethari commented Apr 30, 2025 •

edited

Loading