-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault on m1 mac #8
Comments
Hi, ggml recently introduced a breaking change so existing models have to be re-quantized. This error happens when you are using a old model with the new ggml library. If you pull the latest changes from ggml repo or do a fresh clone, you should get the same error with example code as well. Latest quantized models are available in this repo: https://huggingface.co/NeoDim/starcoderbase-GGML/tree/main If you have already downloaded from this repo, please check if they are the latest as they got updated just 1 day ago. Please ensure you are using the latest version of this library: pip install --upgrade ctransformers and then run: llm = AutoModelForCausalLM.from_pretrained(
'NeoDim/starcoderbase-GGML',
model_file='starcoderbase-ggml-q4_0.bin',
model_type='starcoder',
)
print(llm('Hi', max_new_tokens=1)) Above example downloads the latest model file from hugging face repo directly. Please let me know if this works. The reason I used |
I know, this is mine repo :) It still crashes with segmentation fault |
Oh, nice! :) Can you please try building from source and let me know if it works: git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
./scripts/build.sh The compiled library will be located at llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib') |
Compiled from source also crashes with segmentation fault. |
Thanks for checking. Can you please check with a simpler model to verify if it is starcoder specific issue or library issue: llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml') Also were you getting the error while loading the model using Also can you please share your macOS and Python versions. Since I don't have a mac, it may take a while to debug this. |
Unfortunately it also segfaults |
Were you getting the error while loading the model using |
While generating. Load is fine. Tokenizer also works |
Thanks. Can you try running the following and let me know where it is throwing the error: print('eval', llm.eval([123]))
print('sample', llm.sample()) |
Sample works fine. Eval leads to segmentation fault |
|
I'm pretty new in python world. I hope it can help to debug this issue. |
With pdb:
|
With
|
Thanks for the detailed info. It looks like you are using anaconda and in a different issue (tee-ar-ex/trx-python#23 (comment) not related to this library) someone pointed out that anaconda could be the cause. |
I will try, thanks |
Without anaconda it doesn't segfault. But it's super slow. And parameter threads does nothing. Always 100% cpu. Eleven minutes is not enough to generate single token on m1 max with model "marella/gpt-2-ggml" |
Can you try building from source and see if it improves. |
Sure. Now try using built from source |
Why only single thread running? It also doesn't segfault on manually built library, but also super slow. I'm not sure how long it will take to generate single token. |
16 minutes starchat-alpha-q4_0 100% cpu no output with max_new_tokens=1 file from ctransformers import AutoModelForCausalLM
from ctransformers import AutoConfig
config = AutoConfig.from_pretrained(
"/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
threads=8,
)
llm = AutoModelForCausalLM.from_pretrained(
"/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
model_type="starcoder",
lib="/Users/sergeykostyaev/nn/ctransformers/build/lib/libctransformers.dylib",
config=config,
)
print("loaded")
print(llm("Hi", max_new_tokens=1, threads=8)) Printed only "loaded" |
I think there might be some issue in the library itself. Another user also reported same issue (#1 (comment)) but I thought it was just running slow. Before the breaking changes in GGML, older version of this library was working on M1 mac but just very slow. Now it appears to be not working. Can you also try running a LLaMA model which basically uses llm = AutoModelForCausalLM.from_pretrained(
'TheBloke/LLaMa-7B-GGML',
model_file='llama-7b.ggmlv3.q4_0.bin',
model_type='llama',
) |
I will try with llama.cpp model |
Also inference on example code from library - https://github.com/ggerganov/ggml/tree/master/examples/starcoder run just fine. |
Looks like for now llamacpp models has the same issue on apple silicon |
45 minutes - nothing changes |
Thanks for checking patiently. I will debug this later. Can you please try one last thing: try installing an older version of this library and see if it works: pip install ctransformers==0.1.2 llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
print(llm('Hi', max_new_tokens=1)) |
Sure. |
Looks like after downgrade issue still here. |
Thanks. Tomorrow I will add a main.cc file to repo which can be run directly without Python. It should make it easy to debug the issue. |
I'm suspecting the issue to be with threads library not being found because the errors you posted previously also show threads in error message. When you build ggml repo, are you seeing a line which says Also can you please try removing the line |
After removing line
|
At least the LLaMA model is giving some output, so the C++ code is working. So the issue might be when loading the library into Python. I will search more about this and get back to you if I find a solution. Thanks for helping with the debugging. |
I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says |
Thank you. Will wait if you find a solution. |
No. |
Thanks for checking. I think cmake is just not printing that it found threads library, otherwise it wouldn't work all. |
Hi @marella, I've been mentioned in #1 and #5. I have been able to run quantized models for starcoder, starchat, llama, whisper and mpt so far. Nonetheless, none of them work in ctransformers:
I get exactly the same error as @s-kostyaev, meaning the llm object keeps running forever without any change, using the models natively works just fine. We've been trying to use ctransformers and langchain but nothing works, any new information? I have done everything mentioned in this repo as well, building from source doesn't work. works just fine with ggml natively at 79.63 ms/token |
Hi @bgonzalezfractal, s-kostyaev was helping me debug the issue but I couldn't find the reason/solution to this yet. So far we found that:
I will keep looking for a solution and will let you know on this thread if I find a solution or if I need your help in debugging the issue. Can you also please run the following and share the output: git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug
./scripts/build.sh
./build/lib/main <model_type> <model_path> # example: ./build/lib/main gpt2 /path/to/ggml-model.bin Please share the output of |
|
|
I see code is updated, so this is output of commands. |
Thanks @s-kostyaev, I was actually asking bgonzalezfractal to run it so that I can check and comapre the output on their system as well :) Since you already built it, can you also run |
Sure.
|
Thanks. So the C++ code works fine natively and doesn't have any issue. I will have to debug why it is failing from Python. |
@s-kostyaev I found another issue LibRaw/LibRaw#437 (comment) which looks similar to the error you posted previously #8 (comment) git clone --recurse-submodules https://github.com/marella/ctransformers
cd ctransformers
git checkout debug
./scripts/build.sh llm = AutoModelForCausalLM.from_pretrained(..., lib='/path/to/ctransformers/build/lib/libctransformers.dylib')
print(llm('Hi', max_new_tokens=1, threads=1)) Also please run with In above thread, they also suggested increasing stack size limit but I'm not sure what an ideal limit would be. |
Sure. Will test it. |
With single thread:
And it stucked. |
Are you using |
Sure.
|
This is with 4 threads set. And even set in 2 places - config and llm eval call. |
Thanks. I think I found the issue. I will make a new release and will let you know in sometime. |
@marella sorry I've been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it. |
No worries @bgonzalezfractal @s-kostyaev I released a fix in the latest version 0.2.1 Please update: pip install --upgrade ctransformers and let me know if it works. Please don't set Also please try running with different |
Finally it works. Threads parameter works. It even works with conda now. Thank you! |
Thanks a lot @s-kostyaev for helping in debugging the issue. |
Trying simple example on m1 mac:
leads to segmentation fault. Model works fine with ggml example code.
The text was updated successfully, but these errors were encountered: