Did we support gpt4all-lora-quantized.bin? #86

katopz · 2023-03-29T07:55:11Z

I try use gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself

cargo run --release -- -m ./data/gpt4all-lora-quantized.bin -f examples/alpaca_prompt.txt --repl

And got

[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 131
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 132
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 133
...
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 256
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 257
[2023-03-29T07:21:13Z INFO  llama_cli] Warning: Bad token in vocab at index 258
[2023-03-29T07:21:13Z INFO  llama_cli] ggml ctx size = 4017.35 MB
    
[2023-03-29T07:21:13Z INFO  llama_cli] Loading model part 1/1 from './data/gpt4all-lora-quantized.bin'

thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 2', /Users/katopz/git/katopz/llama-rs/llama-rs/src/lib.rs:773:21

Maybe i've to convert it first?

philpax · 2023-03-29T08:07:18Z

They've made some changes to the inferencing code, with the main one I can see being the addition of a token. Not sure if there's anything else, would appreciate it if you could look into it!

KerfuffleV2 · 2023-03-29T09:25:42Z

I don't think a difference in vocabulary length would matter as the number of vocabulary items is handled dynamically.

let n_dims = read_i32(&mut part_reader)?;
/* ... */
let mut nelements = 1;
let mut ne = [1i32, 1i32];
for i in 0..n_dims {
    ne[i as usize] = read_i32(&mut part_reader)?; // line 773
    nelements *= ne[i as usize];
}

Looks like it's hard coded to support a tensor 2 (or maybe up to 2) dimensions but got one that was dimensions. Fixing this one part probably wouldn't be hard, but I'm pretty sure it'll just break a little later because the tensors aren't the expected shape.

~~(Pretty sure gpt4all would just be a completely different model architecture compared to Llama.)~~ edit: This is wrong, the issue is probably just a vocab size mismatch throwing things out of alignment.

philpax · 2023-03-29T10:30:11Z

Are you sure about the architecture being different? I think it's just a finetuned version of LLaMA - as far as I can tell, the only architectural difference is they supply the <pad> token manually: antimatter15/alpaca.cpp@master...zanussbaum:gpt4all.cpp:master

KerfuffleV2 · 2023-03-29T10:50:34Z

Are you sure about the architecture being different?

Nope. :) I assumed it was a GPT architecture because of the name, but you know what they say about assuming. When you assume, you make an ass of U and ME. Sorry about that.

It seems like they didn't just change the vocab numbers but how it's calculated though, right?

Could possibly try changing this line:

https://github.com/rustformers/llama-rs/blob/a067431773ba0194d6bc352faf40a770c1eac81c/llama-rs/src/lib.rs#L532

to

 n_vocab: read_i32(&mut reader)? - 1,

I guess what they're actually doing is setting n_vocab to 32,001 in the file, but actually only including 32,000 entries and then manually generating an entry after it is loaded. That's really weird. I think my suggestion would work, unless the model is actually going to try to generate that last token though (obviously it will break loading normal Llama models so it would be useful only for checking to see if that allows loading the gpt4all model).

philpax · 2023-03-29T11:16:42Z

Yeah, it's pretty weird (I'd almost say sloppy...)

We could special-case this (assuming that's the fix), or we can patch the model itself to work correctly. I hope they end up doing the latter for us.

cksac · 2023-03-30T14:43:29Z

I convert the model using https://github.com/ggerganov/llama.cpp#using-gpt4all, can run using llama.cpp and llama-rs

katopz · 2023-03-31T13:27:00Z

I convert the model using https://github.com/ggerganov/llama.cpp#using-gpt4all, can run using llama.cpp and llama-rs

I can see

python3 convert-gpt4all-to-ggml.py models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model

But I can't find ./models/tokenizer.model there?

KerfuffleV2 · 2023-03-31T14:11:04Z

@katopz You'll likely need to get your hands on the Llama tokenizer, it's not that hard to find. However, the Llama license doesn't actually allow free distribution which is why you won't find links to that kind of thing in most project repos, etc.

katopz · 2023-03-31T16:31:43Z

Hm I see. But maybe I got wrong ./models/tokenizer.model from random internet dude.

...
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 256
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 257
[2023-03-31T13:24:20Z INFO  llama_cli] Warning: Bad token in vocab at index 258
[2023-03-31T13:24:20Z INFO  llama_cli] ggml ctx size = 4017.35 MB
    
[2023-03-31T13:24:20Z INFO  llama_cli] Loading model part 1/2 from './data/gpt4all-lora-quantized.bin'
    
thread 'main' panicked at 'Could not load model: TensorWrongSize { tensor_name: "tok_embeddings.weight", path: "./data/gpt4all-lora-quantized.bin" }', llama-cli/src/main.rs:209:10

Can i get your converted shasum -a 256 gpt4all-lora-quantized.bin please?

setzer22 · 2023-04-02T10:34:51Z

There's two files: The original gpt4all model has the following hash:

05c9dc0a4904f3b232cffe717091b0b0a8246f49c3f253208fbf342ed79a6122  gpt4all-lora-quantized.bin.orig

And after converting using the convert-gpt4all-to-ggml.py, it should be this hash:

d9af98b0350fc8af7211097e816ffbb8bae9a18f8aea8c50ff94a99bd6cb2c7c  gpt4all-lora-quantized.bin

I confirm I also get the TensorWrongSize error with the converted one (the original model doesn't load either).

Metalflame12 · 2023-04-02T16:48:34Z

This is weird but only works if the model name is not "gpt4all-lora-quantized.bin".
Try renaming to "gpt4all-lora-quantized-ggml.bin" or something like that, it will work.

katopz · 2023-04-02T16:51:26Z

[2023-04-02T16:50:24Z INFO  llama_cli] Loaded tensor 280/291
[2023-04-02T16:50:24Z INFO  llama_cli] Loaded tensor 288/291
[2023-04-02T16:50:24Z INFO  llama_cli] Loading of './data/gpt4all-lora-quantized-ggml.bin' complete
[2023-04-02T16:50:24Z INFO  llama_cli] Model size = 4017.27 MB / num tensors = 291
[2023-04-02T16:50:24Z INFO  llama_cli] Model fully loaded!

WHAT 🤯

KerfuffleV2 · 2023-04-02T16:54:51Z

llama-rs has weird behavior where it will try to discover extra parts if there are other files that start with the same prefix. Maybe that's what happened here?

Caused me an issue when I converted some stuff with the llama.cpp converter and it made a model.bin.tmp file.

Personally, I'd like to just turn off automatic model part discovery and only have it enabled if it's explicitly requested with a commandline flag or something.

Metalflame12 · 2023-04-02T16:59:49Z

yup, removing the file "gpt4all-lora-quantized.bin.orig" from the directory will work without having to rename the other one

philpax · 2023-04-02T17:28:51Z

Hm, didn't realise that could be a problem. I'll change the extra part behaviour to only check for numerical suffixes.

philpax · 2023-04-20T00:33:28Z

It looks like they've moved on from using LLaMA and are now targeting GPT-J (#144). Not sure what to do about this in the meantime - I guess we can tell people to run the conversion script?

katopz · 2023-04-20T01:27:33Z

That's fine enough, maybe we should add "how to" to the readme in conversion section somehow.

philpax · 2023-05-09T16:38:38Z

@danforbes has mentioned that GPT4All-J works ootb with the GPT-J arch. Should we close this?

danforbes · 2023-05-09T17:15:01Z

Tested with https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

philpax added issue:enhancement New feature or request topic:model-support Support for new models labels Apr 20, 2023

katopz closed this as completed May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Did we support gpt4all-lora-quantized.bin? #86

Did we support gpt4all-lora-quantized.bin? #86

katopz commented Mar 29, 2023

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023 •

edited

Loading

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023

philpax commented Mar 29, 2023

cksac commented Mar 30, 2023 •

edited

Loading

katopz commented Mar 31, 2023

KerfuffleV2 commented Mar 31, 2023

katopz commented Mar 31, 2023

setzer22 commented Apr 2, 2023 •

edited

Loading

Metalflame12 commented Apr 2, 2023

katopz commented Apr 2, 2023

KerfuffleV2 commented Apr 2, 2023

Metalflame12 commented Apr 2, 2023

philpax commented Apr 2, 2023

philpax commented Apr 20, 2023

katopz commented Apr 20, 2023

philpax commented May 9, 2023

danforbes commented May 9, 2023

Did we support gpt4all-lora-quantized.bin? #86

Did we support gpt4all-lora-quantized.bin? #86

Comments

katopz commented Mar 29, 2023

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023 • edited Loading

philpax commented Mar 29, 2023

KerfuffleV2 commented Mar 29, 2023

philpax commented Mar 29, 2023

cksac commented Mar 30, 2023 • edited Loading

katopz commented Mar 31, 2023

KerfuffleV2 commented Mar 31, 2023

katopz commented Mar 31, 2023

setzer22 commented Apr 2, 2023 • edited Loading

Metalflame12 commented Apr 2, 2023

katopz commented Apr 2, 2023

KerfuffleV2 commented Apr 2, 2023

Metalflame12 commented Apr 2, 2023

philpax commented Apr 2, 2023

philpax commented Apr 20, 2023

katopz commented Apr 20, 2023

philpax commented May 9, 2023

danforbes commented May 9, 2023

KerfuffleV2 commented Mar 29, 2023 •

edited

Loading

cksac commented Mar 30, 2023 •

edited

Loading

setzer22 commented Apr 2, 2023 •

edited

Loading