Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.78 - cannot load mixtral 8x7b anymore #1219

Open
IcePanther opened this issue Nov 17, 2024 · 7 comments
Open

1.78 - cannot load mixtral 8x7b anymore #1219

IcePanther opened this issue Nov 17, 2024 · 7 comments

Comments

@IcePanther
Copy link

IcePanther commented Nov 17, 2024

Hi,

After upgrading to 1.78 today, I can't load mixtral-based 8x7b models anymore.

Other models such as 30b/70b llama-type models work.

I get the same error whether I use vulkan or CLBlast, and with different models that also have different quantizations. (one q8_0, the other q6_m)

The error reads :

llama_model_load: error loading model: missing tensor 'blk.0.ffn_down_exps.weight'
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "koboldcpp.py", line 4720, in <module>
    main(parser.parse_args(),start_server=True)
  File "koboldcpp.py", line 4344, in main
    loadok = load_model(modelname)
  File "koboldcpp.py", line 900, in load_model
    ret = handle.load_model(inputs)
OSError: exception: access violation reading 0x00000000000018A4
[17628] Failed to execute script 'koboldcpp' due to unhandled exception!

Previous versions of KoboldCPP worked with those same models without a problem.
After reverting, can confirm 1.77 works.
Both are "cu12" versions (I still use CUDA for smaller models).

System has 64 GB RAM, 16GB VRAM (3080Ti laptop), Windows 11

Thanks in advance,

@Conduitry
Copy link

I'm also seeing this same error.

@LostRuins
Copy link
Owner

Yes, unfortunately this is because of the backend refactor in ggerganov#10026

See ggerganov#10244

You can requantize the mixtral model or use https://huggingface.co/mradermacher/Mixtral-8x7B-Instruct-v0.1-GGUF/

I will see if I can port back the support for the old quants, but I cannot guarantee it.

@IcePanther
Copy link
Author

Thanks for the info, I was unaware of this.

It seems that updated models are indeed available on HF. If these work, they will be the simplest solution. I'll report back once I have downloaded some and confirmed they work.

@Conduitry
Copy link

The new quantizations are working for me with 1.78. Thank you!

@LostRuins
Copy link
Owner

I have crafted an ugly hack because I hate losing backwards compatibility

d5feaa8

Should work again in the next version.

@IcePanther
Copy link
Author

Can confirm the new quantized models work for me too with 1.78.

I kept the old ones for now, to test if the backwards compatibility "ugly hack" works in the next version.

@win10ogod
Copy link

Can confirm the new quantized models work for me too with 1.78.

I kept the old ones for now, to test if the backwards compatibility "ugly hack" works in the next version.

update the gguf filetype to current version if older version is now unsupported

./llama-quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants