Welcome to llamacpp-for-kobold Discussions! #2
Replies: 5 comments 11 replies
-
Good morning ! I have always dreamed of having my own little ChatBot in the form of an AI, now I am trying to make my LLaMA accessible from my computer with a beautiful chat interface. Besides, I have a question, how to make it accessible from outside localhost? |
Beta Was this translation helpful? Give feedback.
-
Hey I just wanted to say thanks! This is a great project! |
Beta Was this translation helpful? Give feedback.
-
Same as previous commenter :-) Thank you so much for your project! With what's going on llama.cpp, your project is really useful being backward compatible with all the models versions. |
Beta Was this translation helpful? Give feedback.
-
Is it normal to be slow? You: How far away is the moon from the earth Processing Prompt (16 / 16 tokens) Most questions take around 1 minute to answer. EDIT. Attempting to use OpenBLAS library for faster prompt ingestion. A compatible libopenblas.dll will be required. Initializing dynamic library: koboldcpp_blas.dll Loading model: C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin [Parts: 1, Threads: 11] --- Identified as LLAMA model: (ver 1) Attempting to Load... --- System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | legacy_llama_model_load: Legacy loading model from 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' - please wait ... legacy_llama_model_load: very old v1 model file 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' (please regenerate your model files if you can!) legacy_llama_model_load: n_vocab = 32001 legacy_llama_model_load: n_ctx = 2048 legacy_llama_model_load: n_embd = 4096 legacy_llama_model_load: n_mult = 256 legacy_llama_model_load: n_head = 32 legacy_llama_model_load: n_layer = 32 legacy_llama_model_load: n_rot = 128 legacy_llama_model_load: f16 = 2 legacy_llama_model_load: n_ff = 11008 legacy_llama_model_load: n_parts = 1 legacy_llama_model_load: type = 1 --- !! WARNING: Model appears to be GPT4ALL v1 model, triggering compatibility fix !! --- legacy_llama_model_load: ggml ctx size = 5041.35 MB legacy_llama_model_load: mem required = 6833.35 MB (+ 1026.00 MB per state) legacy_llama_model_load: loading model part 1/1 from 'C:\Users\beno\Downloads\gpt4all-lora-unfiltered-quantized.bin' legacy_llama_model_load: .................................... done legacy_llama_model_load: model size = 4017.27 MB / num tensors = 291 legacy_llama_init_from_file: kv self size = 1024.00 MB --- Warning: Your model has an INVALID or OUTDATED format (ver 1). Please reconvert it for better results! --- Load Model OK: True Embedded Kobold Lite loaded. Starting Kobold HTTP Server on port 5001 Please connect to custom endpoint at http://localhost:5001 The comment about 'for better results' makes me wonder about the speed? |
Beta Was this translation helpful? Give feedback.
-
Guys, where can I find and download the necessary ggml file? I can't find a suitable one, it gives out lies everywhere and does not give out a localhost P.S. I'm a complete bagel, I don't really understand all this |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions