-
Notifications
You must be signed in to change notification settings - Fork 38
i wish for simpler way to run the model #230
Comments
@kolinfluence sry about this. I have checked your script. It's correct. The reason may be an too old ITREX version. I can get the correct result using your script. As you can see the ITREX version is 1.4.1. Please reinstall ITREX, Neural Speed and re-run the script.
|
@Zhenzhong1 i used the same script but i get this. so how do i manually download it and try? p.s. : may i know what's the direction to take for this neural speed thing? are you guys going to improve or seeking to merge into llama.cpp or something?
|
@kolinfluence OK. We also can inference offline. Make sure you have the local file llama-2-7b-chat.Q4_0.gguf and model meta-llama/Llama-2-7b-chat-hf. Please try this script. https://github.com/intel/neural-speed/blob/main/scripts/python_api_example_for_gguf.py For example:
This means you don't have the right to access the llama-2-7b-chat model on the HF. You have to apply for the access token first on the HF.
The neural speed will not be merged into llama.cpp currently. Neural Speed aims to provide the efficient LLMs inference on Intel platforms. For example, Neural Speed provides highly optimized low-precision kernels on CPUs, which means it can get better perfommance vs llama.cpp. Please check this https://medium.com/@NeuralCompressor/llm-performance-of-intel-extension-for-transformers-f7d061556176. |
i'm not well versed with python and where do i put the downloaded llama-2-7b-chat.Q4_0.gguf file?
i can make llama.cpp work real easy on my laptop but i cant seem to get this to work
i did git clone the neural speed, i did the pip install ... saved the file in run_model.py...
python run_model.py
The text was updated successfully, but these errors were encountered: