Skip to content
@makllama

MaKLlama

MaK(Mac+Kubernetes)llama: running and orchestrating large language models (LLMs) on Kubernetes with Mac nodes.

MaKllama Organization

The following video demonstrates the below steps:

  1. Add a Mac node with Apple-Silicon chip to a Kubernetes cluster (in seconds!).
  2. Manually start Bronze Willow (BW) on the Mac node (top-right terminal).
  3. Deploy tinyllama with 2 replicas.
  4. Access the OpenAI API-compatible endpoint through mods.

Demo

Popular repositories Loading

  1. makllama makllama Public

    MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.

    Go 35 3

  2. llama.cpp llama.cpp Public

    Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    C++ 3

  3. containerd containerd Public

    Forked from containerd/containerd

    An open and reliable container runtime

    Go 1

  4. cri cri Public

    Forked from virtual-kubelet/cri

    Go 1 1

  5. .github .github Public

  6. ollama ollama Public

    Forked from ollama/ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

    Go

Repositories

Showing 10 of 19 repositories
  • ktransformers Public Forked from kvcache-ai/ktransformers

    A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

    makllama/ktransformers’s past year of commit activity
    Python 0 Apache-2.0 669 0 0 Updated Feb 20, 2025
  • llama-box Public Forked from gpustack/llama-box

    LLM inference server implementation based on llama.cpp.

    makllama/llama-box’s past year of commit activity
    C++ 0 MIT 13 0 0 Updated Feb 16, 2025
  • ollama Public Forked from ollama/ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

    makllama/ollama’s past year of commit activity
    Go 0 MIT 10,733 0 0 Updated Feb 14, 2025
  • stable-diffusion.cpp Public Forked from leejet/stable-diffusion.cpp

    Stable Diffusion and Flux in pure C/C++

    makllama/stable-diffusion.cpp’s past year of commit activity
    C++ 0 MIT 359 0 0 Updated Feb 14, 2025
  • gpustack Public Forked from gpustack/gpustack

    Manage GPU clusters for running LLMs

    makllama/gpustack’s past year of commit activity
    Python 0 Apache-2.0 160 0 0 Updated Feb 13, 2025
  • llama.cpp Public Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    makllama/llama.cpp’s past year of commit activity
    C++ 3 MIT 11,090 0 0 Updated Feb 13, 2025
  • exo Public Forked from exo-explore/exo

    Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

    makllama/exo’s past year of commit activity
    Python 0 GPL-3.0 1,459 0 0 Updated Nov 28, 2024
  • llama-cpp-python Public Forked from abetlen/llama-cpp-python

    Python bindings for llama.cpp

    makllama/llama-cpp-python’s past year of commit activity
    Python 0 MIT 1,114 0 0 Updated Nov 26, 2024
  • fastfetch Public Forked from gpustack/fastfetch

    Like neofetch, but much faster because written mostly in C.

    makllama/fastfetch’s past year of commit activity
    C 0 MIT 478 0 0 Updated Nov 19, 2024
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    makllama/vllm’s past year of commit activity
    Python 0 Apache-2.0 5,880 0 0 Updated Oct 16, 2024

Top languages

Loading…

Most used topics

Loading…