koboldcpp-1.78
- NEW: Added support for Flux and Stable Diffusion 3.5 models: Image generation has been updated with new arch support (thanks to stable-diffusion.cpp) with additional enhancements. You can use either fp16 or fp8 safetensor models, or the GGUF models. Supports all-in-one models (bundled T5XXL, Clip-L/G, VAE) or loading them individually.
- Grab an all-in-one flux model here: https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors
- Alternatively, we have a ready to use
.kcppt
template that will setup and download everything you need here: https://huggingface.co/koboldcpp/kcppt/resolve/main/Flux1-Dev.kcppt - Large image handling is also more consistent with VAE tiling, 1024x1024 should work nicely for SDXL and Flux.
- You can specify the new image gen components by loading them with
--sdt5xxl
,--sdclipl
and--sdclipg
(for SD3.5), they work with URL resources as well. - Note: FP16 Flux needs over 20GB of VRAM to work. If you have less VRAM, you should use the quantized GGUFs, or select Compress Weights when loading the Flux model. SD3.5 medium is more forgiving.
- As before, it can be used with the bundled StableUI at http://localhost:5001/sdui/
- Debug mode prints penalties for XTC
- Added a new flag
--nofastforward
, this forces full prompt reprocessing on every request. It can potentially give more repeatable/reliable/consistent results in some cases. - CLBlast support is still retained, but has been further downgraded to "compatibility mode" and is no longer recommended (use Vulkan instead). CLBlast GPU offload must now maintain duplicate a copy of the layers in RAM as well, as it now piggybacks off the CPU backend.
- Added common identity provider
/.well-known/serviceinfo
Haidra-Org/AI-Horde#466 PygmalionAI/aphrodite-engine#807 theroyallab/tabbyAPI#232 - Reverted some changes that reduced speed in HIPBLAS.
- Fixed a bug where bad logprobs JSON was output when logits were
-Infinity
- Updated Kobold Lite, multiple fixes and improvements
- Added support for custom CSS styles
- Added support for generating larger images (select BigSquare in image gen settings)
- Fixed some streaming issues when connecting to Tabby backend
- Better world info length limiting (capped at 50% of max context before appending to memory)
- Added support for Clip Skip for local image generation.
- Merged fixes and improvements from upstream
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.