Skip to content

koboldcpp-1.42.1

Compare
Choose a tag to compare
@LostRuins LostRuins released this 30 Aug 15:23
· 4362 commits to concedo since this release

koboldcpp-1.42.1

  • Added support for LLAMA GGUFv2 models, handled automatically. All older models will still continue to work normally.
  • Fixed a problem with certain logit values that were causing segfaults when using the Typical sampler. Please let me know if it happens again.
  • Merged rocm support from @YellowRoseCx so you should now be able to build AMD compatible GPU builds with HIPBLAS, which should be faster than using CLBlast.
  • Merged upstream support for GGUF Falcon models. Note that GPU layer offload for Falcon is unavailable with --useclblast but works with CUDA. Older pre-gguf Falcon models are not supported.
  • Added support for unbanning EOS tokens directly from API, and by extension it can now be triggered from Lite UI settings. Note: Your command line --unbantokens flag will force override this.
    - Added support for automatic rope scale calculations based on a model's training context (n_ctx_train), this triggers if you do not explicitly specify a --ropeconfig. For example, this means llama2 models will (by default) use a smaller rope scale compared to llama1 models, for the same specified --contextsize. Setting --ropeconfig will override this. (reverted in 1.42.1 for now, it was not setup correctly)
  • Updated Kobold Lite, now with tavern style portraits in Aesthetic Instruct mode.
  • Pulled other fixes and improvements from upstream.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.