Welcome to llamacpp-for-kobold Discussions! #2

LostRuins · 2023-03-25T01:55:00Z

LostRuins
Mar 25, 2023
Maintainer

👋 Welcome!

We’re using Discussions as a place to connect with other members of our community. We hope that you:

Ask questions you’re wondering about.
Share ideas.
Engage with other community members.
Welcome others and are open-minded. Remember that this is a community we
build together 💪.

To get started, comment below with an introduction of yourself and tell us about what you do with this community.

Belarrius1 · 2023-03-28T15:30:37Z

Belarrius1
Mar 28, 2023

Good morning !

I have always dreamed of having my own little ChatBot in the form of an AI, now I am trying to make my LLaMA accessible from my computer with a beautiful chat interface.

Besides, I have a question, how to make it accessible from outside localhost?

8 replies

Belarrius1 Mar 30, 2023

Well @LostRuins, I noticed that in French, there were good differences in writing and understanding between float16 and 4-bit.

The float16 seems more accurate and reliable even though it requires a lot more memory.

I also tested with the LLaMA 13B model, the 4-bit seems to handle French better than 7B 4-bit version but again the float16 version are more precise for the French language.

Belarrius1 Mar 30, 2023

Oh and one thing I noticed, the consistency and "always in french" understanding is vastly better on my linux computer than on my windows.

Maybe it's due to the environment of Ubuntu Server compared to Windows? Or the hardware?

My Windows 10 : i9 9900k, 32GB of RAM, RTX 3090
My Ubuntu Server 22.04.2 LTS : i7 10750H, 64GB of RAM

Do you have any idea why for me the ChatBot seems better on my linux? (with the same 7B 4-bit model and the same repository version, I use python version on Linux and Binaries on Windows, don't use BLAS on both)

LostRuins Mar 30, 2023
Maintainer Author

That is strange. If you use the same settings and same model your results should be similar. Are you sure your prompt and generation settings are the same?

Belarrius1 Mar 30, 2023

Yeah I use the same save json

Maybe the quantized model have a lack of precision?

LostRuins Mar 30, 2023
Maintainer Author

possible

Completeandtotalidiotalsolazy · 2023-04-02T20:05:17Z

Completeandtotalidiotalsolazy
Apr 2, 2023

Hey I just wanted to say thanks! This is a great project!

0 replies

linouxis9 · 2023-04-02T23:21:22Z

linouxis9
Apr 2, 2023

Same as previous commenter :-) Thank you so much for your project! With what's going on llama.cpp, your project is really useful being backward compatible with all the models versions.

1 reply

LostRuins Apr 3, 2023
Maintainer Author

If you liked it, be sure to star it on github :)

thebaldgeek · 2023-04-07T17:48:14Z

thebaldgeek
Apr 7, 2023

Is it normal to be slow?
Im using a 'ggml-alpaca-7b-q4.bin' on the following hardware:
AMD Ryzen 9 5900X 12-Core Processor, 3701 Mhz, 12 Core(s), 24 Logical Processor(s)
It has 128gb of RAM. With the model loaded it is using 28gb.
(Don't think it uses any vram, but the PC has an Nvida 2070 Super with 8gb).

You: How far away is the moon from the earth
KoboldGPT:

Processing Prompt (16 / 16 tokens)
Generating (80 / 80 tokens)
Time Taken - Processing:4.4s (277ms/T), Generation:53.5s (669ms/T), Total:58.0s

Most questions take around 1 minute to answer.
The CPU is not maxed out when idle and not maxed out when processing a question.
I'm surprised that a 3 month old PC is so slow.

EDIT.
Is this why it is slow?
I see this when I start it up....

Attempting to use OpenBLAS library for faster prompt ingestion. A compatible libopenblas.dll will be required. Initializing dynamic library: koboldcpp_blas.dll Loading model: C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin [Parts: 1, Threads: 11] --- Identified as LLAMA model: (ver 1) Attempting to Load... --- System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | legacy_llama_model_load: Legacy loading model from 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' - please wait ... legacy_llama_model_load: very old v1 model file 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' (please regenerate your model files if you can!) legacy_llama_model_load: n_vocab = 32001 legacy_llama_model_load: n_ctx = 2048 legacy_llama_model_load: n_embd = 4096 legacy_llama_model_load: n_mult = 256 legacy_llama_model_load: n_head = 32 legacy_llama_model_load: n_layer = 32 legacy_llama_model_load: n_rot = 128 legacy_llama_model_load: f16 = 2 legacy_llama_model_load: n_ff = 11008 legacy_llama_model_load: n_parts = 1 legacy_llama_model_load: type = 1 --- !! WARNING: Model appears to be GPT4ALL v1 model, triggering compatibility fix !! --- legacy_llama_model_load: ggml ctx size = 5041.35 MB legacy_llama_model_load: mem required = 6833.35 MB (+ 1026.00 MB per state) legacy_llama_model_load: loading model part 1/1 from 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' legacy_llama_model_load: .................................... done legacy_llama_model_load: model size = 4017.27 MB / num tensors = 291 legacy_llama_init_from_file: kv self size = 1024.00 MB --- Warning: Your model has an INVALID or OUTDATED format (ver 1). Please reconvert it for better results! --- Load Model OK: True Embedded Kobold Lite loaded. Starting Kobold HTTP Server on port 5001 Please connect to custom endpoint at http://localhost:5001

The comment about 'for better results' makes me wonder about the speed?
If it is the issue, how do I update it?

1 reply

LostRuins Apr 8, 2023
Maintainer Author

Yes unfortunately, running large models on the CPU is not an easy task. The Alpaca model you are using is a bit outdated, I cannot directly link new versions to you but you can find them by going to https://huggingface.co/ and searching "Alpaca ggml", there should be quite a bunch of various ones, thought i'm not sure if there's a newer version for GPT4ALL.

Alternatively, a smaller model will be much much faster. For example, this is GPT2-117M https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin

VovaS345 · 2023-04-13T07:48:41Z

VovaS345
Apr 13, 2023

Guys, where can I find and download the necessary ggml file? I can't find a suitable one, it gives out lies everywhere and does not give out a localhost

P.S. I'm a complete bagel, I don't really understand all this

1 reply

LostRuins Apr 13, 2023
Maintainer Author

There's a tiny model you can try out first before you obtain larger ones: https://huggingface.co/concedo/cerebras-111M-ggml/blob/main/cerberas-111m-q4_0.bin

This should at least let you tell if its working.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to llamacpp-for-kobold Discussions! #2

{{title}}

Replies: 5 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Welcome to llamacpp-for-kobold Discussions! #2

LostRuins Mar 25, 2023 Maintainer

👋 Welcome!

Replies: 5 comments · 11 replies

LostRuins Mar 30, 2023 Maintainer Author

LostRuins Mar 30, 2023 Maintainer Author

LostRuins Apr 3, 2023 Maintainer Author

LostRuins Apr 8, 2023 Maintainer Author

LostRuins Apr 13, 2023 Maintainer Author

LostRuins
Mar 25, 2023
Maintainer

Replies: 5 comments 11 replies

LostRuins Mar 30, 2023
Maintainer Author

LostRuins Mar 30, 2023
Maintainer Author

LostRuins Apr 3, 2023
Maintainer Author

LostRuins Apr 8, 2023
Maintainer Author

LostRuins Apr 13, 2023
Maintainer Author