-
Notifications
You must be signed in to change notification settings - Fork 367
Did we support gpt4all-lora-quantized.bin? #86
Comments
They've made some changes to the inferencing code, with the main one I can see being the addition of a token. Not sure if there's anything else, would appreciate it if you could look into it! |
I don't think a difference in vocabulary length would matter as the number of vocabulary items is handled dynamically. let n_dims = read_i32(&mut part_reader)?;
/* ... */
let mut nelements = 1;
let mut ne = [1i32, 1i32];
for i in 0..n_dims {
ne[i as usize] = read_i32(&mut part_reader)?; // line 773
nelements *= ne[i as usize];
} Looks like it's hard coded to support a tensor 2 (or maybe up to 2) dimensions but got one that was dimensions. Fixing this one part probably wouldn't be hard, but I'm pretty sure it'll just break a little later because the tensors aren't the expected shape.
|
Are you sure about the architecture being different? I think it's just a finetuned version of LLaMA - as far as I can tell, the only architectural difference is they supply the |
Nope. :) I assumed it was a GPT architecture because of the name, but you know what they say about assuming. When you assume, you make an ass of U and ME. Sorry about that. It seems like they didn't just change the vocab numbers but how it's calculated though, right? Could possibly try changing this line: to n_vocab: read_i32(&mut reader)? - 1, I guess what they're actually doing is setting |
Yeah, it's pretty weird (I'd almost say sloppy...) We could special-case this (assuming that's the fix), or we can patch the model itself to work correctly. I hope they end up doing the latter for us. |
I convert the model using https://github.com/ggerganov/llama.cpp#using-gpt4all, can run using llama.cpp and llama-rs |
I can see python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model But I can't find |
@katopz You'll likely need to get your hands on the Llama tokenizer, it's not that hard to find. However, the Llama license doesn't actually allow free distribution which is why you won't find links to that kind of thing in most project repos, etc. |
Hm I see. But maybe I got wrong
Can i get your converted |
There's two files: The original gpt4all model has the following hash:
And after converting using the
I confirm I also get the TensorWrongSize error with the converted one (the original model doesn't load either). |
This is weird but only works if the model name is not "gpt4all-lora-quantized.bin". |
WHAT 🤯 |
llama-rs has weird behavior where it will try to discover extra parts if there are other files that start with the same prefix. Maybe that's what happened here? Caused me an issue when I converted some stuff with the Personally, I'd like to just turn off automatic model part discovery and only have it enabled if it's explicitly requested with a commandline flag or something. |
yup, removing the file "gpt4all-lora-quantized.bin.orig" from the directory will work without having to rename the other one |
Hm, didn't realise that could be a problem. I'll change the extra part behaviour to only check for numerical suffixes. |
It looks like they've moved on from using LLaMA and are now targeting GPT-J (#144). Not sure what to do about this in the meantime - I guess we can tell people to run the conversion script? |
That's fine enough, maybe we should add "how to" to the readme in conversion section somehow. |
@danforbes has mentioned that GPT4All-J works ootb with the GPT-J arch. Should we close this? |
I try use gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself
And got
Maybe i've to convert it first?
The text was updated successfully, but these errors were encountered: