Skip to content

To help us learn more about AI while solving some useful problems

License

Notifications You must be signed in to change notification settings

spikelab/mrwolf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mr Wolf - exploring agentic systems

Mr Wolf

Overview

The main goal is to explore the space of AI Agents and Agentic systems and finally build something end-to-end so that, with the confidence that only comes from doing, I can tell what's real and what works instead of feeling lost between the market hype and the research papers. All of this while building a personal assistant, and who better than Mr Wolf... the character from Pulp Fiction whose famous line is I solve problems.

While the aim is to create something that can be used, and therefore tested, in real-life (instead of benchmarks), the project is ultimately focused on technical exploration with several specific goals:

  • RAG: Going beyond simple vectorization and cosine similarity tutorials to explore the complexities of ingesting a variety of documents (understanding special formats, how to deal with tabular data, parsing and chunking choices) to retrieval mechanisms (reranking, hybrid search, etc)
  • Prompting, agent collaboration and ReACT strategies: Investigating how effectively multiple bots can collaborate, how prompting affects final results, and testing different prompts and models (hopefully with an A/B testing tool similar to ChainForge)
  • Explore multimodal (Voice/Image): Learning more about voice, more deeply understanding the challenges and the delta between something high quality like 11Labs with common solutions such as AWS Polly, plus exploring image genereration with new models like Flux.
  • Hallucination: how frequent they actually are and how much can they really be curbed
  • Memory and self-learning: short and long term, how to get models to remember without getting confused and learn and improve as they work and get feedback
  • CaLM and conversational AI: what's the best way to implement deterministic workflows? does eveyrything need a LLM? (it does not, but then what exactly?). CaLM looks like a very reasonable approach, but needs exploring.
  • Costs and failures: Measuring real-world agent failure rates and operational costs with normal daily usage, especially with ASR/TTS

Part of the fun and exploration will also be to use GH Copilot and other AI coding tools like Claude Code to see how it works on a full project as opposed little scripts which is all I've used these tools for so far.

Tech Stack

To focus on AI rather than plumbing and ensure immediate mobile access, the project makes several opinionated technology choices:

  • Docker for local development
  • n8n as a lowcode solution to model agents and connect to 3rd party services. It also aligns with my basic mental framework of workflows as agents.
  • PostgreSQL as the primary database
    • Compatible with n8n, supports embeddings via pgvector extension and enables hybrid search
    • pgAdmin included for database exploration
  • Telegram as the communication channel, saves us having to build a UI to start with and allows for all sorts of media interaction on any devide
  • Metabase for easy observability from day one, makes it a breeze to create any dashboard
  • Ollama for local development and fast iterations (llama.cpp seems would have been better, but too troublesome to get working). For more power possibly Bedrock for easy LLM access without needing 20 accounts, LLama 3 for the most, cohere for embeddings and reranking, stability for images, Polly/Transcribe for voice. Alternatively openAI, which would provide all but reranking I think. Either should be a solid baseline that can be taken to production with ease and then used for comparison with other models.

Non-Focus Areas

The project currently does not focus on:

  • Latency and scalability, although these ultimately make a significant difference
  • Realtime/streaming, particularly for voice (the current implementation uses audio messages rather than calls, and n8n doesn't support streaming)

System Design

Mr Wolf

This project is focused on the director->subagents->tasks model. In this design a main agent receives the incoming queries and decides which agent to delegate to which eventually performs the tasks and returns to the director agent. One of the things to clarify is who does the evaluation, if every subagent has a companion agent for evaluations or not, perhaps using a different model, but otherwise the design should be pretty straightforward. All agents will be implemented as their own flow with the director flow calling the other flows.

One more thing that's also discussed in the resources below is the human in the loop. The project will explore how to get feedback, but also permissions from humans to proceed with a certain set of actions.

These are a couple resources that clearly explain the model, one implemented on N8N (his flows will be included in the default repo) and another with another platform, Relevance AI:

Nate Herk - I Built the Ultimate Team of AI Agents in n8n With No Code (Free Template) Nate Herk - I Built the Ultimate Team of AI Agents in n8n With No Code (Free Template)

Ben AI - This 20+ AI Agent Team Automates ALL Your Work (GPT-01) (Relevance AI) Ben AI - This 20+ AI Agent Team Automates ALL Your Work (GPT-01) (Relevance AI)

(...yes, everything and anything... welcome to the hype and clickbait)

Something like that will represent the backbone and the baseline, but then the real work begins to go in depth on the above topics and try to implement more complex RAG solutions, try different prompts and so on.

Getting Started

I began with docker, but found it faster to hack on things if I installed locally with brew, especially when using ngrok or whisper, too much overhead setting everything up in docker. Still got some stuff left around that I may pick up later once I clean things up and try to make it easier for distribution, if I ever get there.

Docker

You need docker installed, then just checkout this repo and run docker-compose -f docker/docker-compose.yml up -d and you should see 4 containers showing up:

  • postgres
  • n8n
  • pgadmin
  • metabase

Source

Assuming dev on Mac with brew available. Setting up brew is beyond the scope of this. N8N uses the LTS version of node so you will most likely need NVM.

For N8N:

  • brew uninstall --force node if you'd like to avoid falling for a default
  • brew install nvm
  • add the stuff to your shell profile as indicated in the ouput of the installation
  • nvm install v18.17.0
  • cd n8n
  • npm install n8n nodemon (local install, not global)
  • npm run dev
  • note that by default n8n will use a local sqlite db which is kind of nice for testing
  • when connecting to ollama you need to use 127.0.0.1 instead of localhost, seems that otherwise ipv6 messes stuff up
  • you will also need to setup ngrok below and configure the webhook url if you want to be able to connect telegram or other services from outside

For ngrok, which you need if you want to connect telegram to N8N:

  • you need to register on ngrok.com . It's free for just HTTP and ephemeral domain, but it does require registration
  • brew install ngrok
  • follow the instructions to connect your ngrok client to their platform (need token auth)
  • claim a free domain, easier than random ip numbers
  • run ngrok with your customer domain as a parameter and watch the magic happening
  • this is a good video with all the steps: https://www.youtube.com/watch?v=3sshIaiotmc

For Ollama:

  • brew install ollama
  • ollama serve (you can also use brew brew services start ollama and set it up to autostart if you'd like)
  • ollama pull {llama3.2:latest, nomic-embed-text:latest, deepseek-r1:14b}
  • ollama run llama3.2 if you wanna test it

For postgresl:

  • brew install postgresql
  • brew services start postgresql
  • psql postgres to get in and test it
  • brew install pgvector
  • psql postgres , CREATE EXTENSION vector;

For stt:

Besides that, you will need to setup telegram and ngrok so that

If you are using llama.cpp for local inferencing you most likely want to shut down pgadmin and metabase to get some resources back. It may also make sense to support a mode with N8N running locally with sqlite as default without even docker and no pgsql, but then you need a different vector store, which I'm not sure what it'd buy you mem profile wise over using pgsql.

Resources and links for exploration

Just keeping track of good articles to explore the above topics:

LLMs

After failing to get llama.cpp I switched to Ollama and life has been great (it's ironic since ollama uses llama.cpp under the hood... but whatever) I've downloaded 3 key models from the ollama repo:

  • llama 3.2 for chat
  • deepseek-r1 for reasoning
  • nomadic for mebedding

all models are quantized to 4bit and Q_M for balanced perf and quality. Also on my mac Mini M4 with 16GB ram I can actually run both deepseek at 14B parameters and gemma3 at 12B params without breaking a sweat:

Token gen speed Deepseek-r1 14B on Mac Mini M4 16GB ram Token gen speed Gemma3 12B on Mac Mini M4 16GB ram

N8N

Voice

RAG Techniques

Document ingestion/ETL/Cleanup

PG Vector

Contributing

It's all about exploration because doing is 100 times better than just watching videos and reading stuff, feel free to join, the more the merrier.

Todo

See open issues and The Stuff project for things to pick up.

Hackatons

Because stuff is more fun together, as much as possible I'd like to work in burst along with other people, possibly in realtime. As such I will try to organize some period hackatons to move the project forward as on a daily basis I simply don't have enough time between work and family.

About

To help us learn more about AI while solving some useful problems

Topics

Resources

License

Stars

Watchers

Forks

Languages