Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: openllm run phi3:3.8b-ggml-q4 build fails to find FOUNDATION_LIBRARY #1064

Open
sij1nk opened this issue Aug 16, 2024 · 2 comments
Open

Comments

@sij1nk
Copy link

sij1nk commented Aug 16, 2024

Describe the bug

Hi!

I tried to run an llm locally using openllm, and phi3:3.8b-ggml-q4 happens to be the only model which I am able to run locally according to openllm, so I ran openllm run phi3:3.8b-ggml-q4, which failed (logs are attached).

The failure happens during a cmake build process, as it is unable to find FOUNDATION_LIBRARY (line 116 in the logs). I looked into what this library provides and ended up in ggerganov/llama.cpp

However I assume GGML_METAL should not be set, as I'm running on a Linux (well, Windows10 + WSL2) system and an nvidia gpu. My attempts at disabling the Metal build were not successful, command failed with the same error as before.

On the same machine, but on Windows, openllm run phi3:3.8b-ggml-q4 fails on the same cmake build with the same error. Curiously, openllm hello does not recognize this model as locally runnable as it did on WSL, but I did not investigate why

Please let me know if I should raise this issue in ggerganov/llama.cpp instead

Thanks!

To reproduce

  1. Happen to have the same system as I do, I guess
  2. Run openllm run phi3:3.8b-ggml-q4
  3. Observe the error

Logs

gist

Environment

bentoml env:

#### Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

#### System information

`bentoml`: 1.3.1
`python`: 3.10.12
`platform`: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
`uid_gid`: 1000:1000
<details><summary><code>pip_packages</code></summary>

<br>

</details>

transformers-cli env:

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.44.0
- Platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.4
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>

openllm -v:

openllm, 0.6.7
Python (CPython) 3.10.12

System information (Optional)

memory: 32GB
platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
architecture: x86-64
cpu: intel core i7-11850H @ 2.50GHz
gpu: NVIDIA RTX A2000 Laptop GPU 4GB

@bojiang
Copy link
Member

bojiang commented Sep 5, 2024

Hi. Thanks for contributing. For now llamacpp models in openllm can only be deployed to MacOS. There are some hard coded platform specified parameters. I suggest to use vllm on Linux and even WSL2, since it is faster and more mature for production usages.

We are adding a new feature for easier tweak on parameters. Maybe you can help us to find out a pair of configuration to run vllm models on a 4G GPU after released.

@sij1nk
Copy link
Author

sij1nk commented Sep 5, 2024

Thank you! I will look into vllm some time later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants