Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download of BAAI/bge-m3 fails on 1.5 using ONNX #417

Open
2 of 4 tasks
avvertix opened this issue Oct 1, 2024 · 6 comments
Open
2 of 4 tasks

Download of BAAI/bge-m3 fails on 1.5 using ONNX #417

avvertix opened this issue Oct 1, 2024 · 6 comments

Comments

@avvertix
Copy link

avvertix commented Oct 1, 2024

System Info

  • text-embeddings-inference version: 1.5
  • OS: Windows/Debian 11
  • Deployment: Docker
  • Model: BAAI/bge-m3

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Configuring TEI 1.5-cpu to run BAAI/bge-m3 in Docker (or Docker Compose) results in model not downloaded even if in Hugging Face model files are downloadable and onnx folder is present.

To replicate run

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id BAAI/bge-m3

or

services:
    embeddings:
        image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
        command: --model-id BAAI/bge-m3
        ports:
          - "8080:80"

The resulting output is the following

2024-10-01T12:02:10.892818Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/**e-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "1e402b3ef386", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-10-01T12:02:10.893014Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-10-01T12:02:10.959512Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-10-01T12:02:12.155636Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-10-01T12:02:12.418430Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-10-01T12:02:12.418475Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-10-01T12:02:12.689863Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-10-01T12:02:15.212593Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-10-01T12:02:15.337129Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-m3/resolve/main/model.onnx)
2024-10-01T12:02:15.337216Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-10-01T12:02:15.782935Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 3.364505011s
2024-10-01T12:02:16.281335Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 8192
2024-10-01T12:02:16.286095Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-10-01T12:02:17.421733Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor 0.auto_model.encoder.layer.16.attention.output.LayerNorm.weight failed.GetFileLength for /data/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181/onnx/model.onnx_data failed:Invalid fd was supplied: -1

Checking the downloaded files I see the following blobs

-rw-r--r-- 1 root root   54 Oct  1 12:55 0140ba1eac83a3c9b857d64baba91969d988624b
-rw-r--r-- 1 root root  123 Oct  1 12:55 1fba91c78a6c8e17227058ab6d4d3acb5d8630a9
-rw-r--r-- 1 root root  17M Oct  1 12:55 21106b6d7dab2952c1d496fb21d5dc9db75c28ed361a05f5020bbba27810dd08
-rw-r--r-- 1 root root  191 Oct  1 12:55 9bd85925f325e25246d94c4918dc02ab98f2a1b7
-rw-r--r-- 1 root root  687 Oct  1 12:55 e6eda1c72da8f9dc30fdd9b69c73d35af3b7a7ad
-rw-r--r-- 1 root root 708K Oct  1 12:55 f84251230831afb359ab26d9fd37d5936d4d9bb5d1d5410e66442f630f24435b

Somehow in the onnx folder is also present a file named model.onnx_data that is probably missing from the download.

Expected behavior

The model is fully downloaded. If not a clear error should state that there was some kind of HTTP error. Maybe showing the expected download size and what has been downloaded so far will help.

@ladi-pomsar
Copy link

ladi-pomsar commented Oct 3, 2024

This might be related to Issue 341. Try to use tag cpu-latest instead of 1.5.

@avvertix
Copy link
Author

avvertix commented Oct 3, 2024

Using cpu-latest tag seems to load all files.

@jafar9
Copy link

jafar9 commented Oct 16, 2024

i tried with cpu-latest tag for downloading the BAAI/bge-large-en-v1.5, in the logs it is showing the same error logs.

but with cpu-1.2 tag it is working fine. not seeing any error logs.

2024-10-16T08:46:44.147396Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:368: Downloading `model.onnx`
2024-10-16T08:46:44.164477Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:372: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx)
2024-10-16T08:46:44.164493Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:373: Downloading `onnx/model.onnx`
2024-10-16T08:46:52.955942Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:379: Downloading `model.onnx_data`
2024-10-16T08:46:52.981153Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:383: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx_data)
2024-10-16T08:46:52.981170Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:384: Downloading `onnx/model.onnx_data`
2024-10-16T08:46:53.009789Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:388: Could not download `onnx/model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/onnx/model.onnx_data)

@kozistr
Copy link
Contributor

kozistr commented Oct 19, 2024

i tried with cpu-latest tag for downloading the BAAI/bge-large-en-v1.5, in the logs it is showing the same error logs.

but with cpu-1.2 tag it is working fine. not seeing any error logs.

2024-10-16T08:46:44.147396Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:368: Downloading `model.onnx`
2024-10-16T08:46:44.164477Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:372: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx)
2024-10-16T08:46:44.164493Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:373: Downloading `onnx/model.onnx`
2024-10-16T08:46:52.955942Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:379: Downloading `model.onnx_data`
2024-10-16T08:46:52.981153Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:383: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx_data)
2024-10-16T08:46:52.981170Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:384: Downloading `onnx/model.onnx_data`
2024-10-16T08:46:53.009789Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:388: Could not download `onnx/model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/onnx/model.onnx_data)

it seems there's no problem loading BAAI/bge-large-en-v1.5 in my case. failing to download model.onnx_data is okay because BAAI/bge-large-en-v1.5 only needs model.onnx file!

afaik, cpu-1.2 might use candle backend instead of ort to run without onnx file(s).

zero@kozistr:~/text-embeddings-inference$ ./target/release/text-embeddings-router --model-id BAAI/bge-large-en-v1.5 --port 8080 --pooling cls --dtype float32
2024-10-19T02:06:11.320348Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-*****-**-v1.5", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: Some(Cls), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "0.0.0.0", port: 8080, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-10-19T02:06:11.322255Z  INFO hf_hub: /home/zero/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/home/zero/.cache/huggingface/token"
2024-10-19T02:06:11.875196Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-10-19T02:06:12.268437Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-10-19T02:06:12.268484Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-10-19T02:06:12.652669Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-10-19T02:06:13.204868Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:368: Downloading `model.onnx`
2024-10-19T02:06:13.399961Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:372: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx)
2024-10-19T02:06:13.399995Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:373: Downloading `onnx/model.onnx`
2024-10-19T02:06:55.927207Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:379: Downloading `model.onnx_data`
2024-10-19T02:06:56.124880Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:383: Could not download `model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/model.onnx_data)
2024-10-19T02:06:56.124912Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:384: Downloading `onnx/model.onnx_data`
2024-10-19T02:06:56.336581Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:388: Could not download `onnx/model.onnx_data`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-large-en-v1.5/resolve/main/onnx/model.onnx_data)
2024-10-19T02:06:56.336617Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 44.068181831s
2024-10-19T02:06:56.347797Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-10-19T02:06:56.347857Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 8 tokenization workers
2024-10-19T02:06:56.387941Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-10-19T02:06:57.947415Z  WARN text_embeddings_router: router/src/lib.rs:267: Backend does not support a batch size > 8
2024-10-19T02:06:57.947466Z  WARN text_embeddings_router: router/src/lib.rs:268: forcing `max_batch_requests=8`
2024-10-19T02:06:57.949849Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1838: Starting HTTP server: 0.0.0.0:8080
2024-10-19T02:06:57.949861Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1839: Ready
2024-10-19T02:07:32.090000Z  INFO embed{total_time="157.884382ms" tokenization_time="204.2µs" queue_time="313.9µs" inference_time="157.323382ms"}: text_embeddings_router::http::server: router/src/http/server.rs:723: Success

@jazibjamil
Copy link

jazibjamil commented Nov 19, 2024

facing similar error when trying cpu-latest image with bge-reranker-v2-m3;

2024-11-19T06:44:07.554912Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "dd631d19f1ee", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-11-19T06:44:07.554978Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-11-19T06:44:07.608246Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-11-19T06:44:08.265491Z  WARN text_embeddings_router: router/src/lib.rs:95: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2024-11-19T06:44:10.240145Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-11-19T06:44:10.485670Z  WARN text_embeddings_router: router/src/lib.rs:105: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2024-11-19T06:44:10.485706Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-11-19T06:44:10.485718Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-11-19T06:44:10.485883Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-11-19T06:44:10.485985Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:368: Downloading `model.onnx`
2024-11-19T06:44:10.791147Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:372: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/model.onnx)
2024-11-19T06:44:10.791193Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:373: Downloading `onnx/model.onnx`
thread 'main' panicked at /usr/src/backends/src/lib.rs:316:17:
failed to download `model.onnx` or `model.onnx_data`. Check the onnx file exists in the repository. request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/onnx/model.onnx)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ladi-pomsar
Copy link

facing similar error when trying cpu-latest image with bge-reranker-v2-m3;

2024-11-19T06:44:07.554912Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "dd631d19f1ee", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-11-19T06:44:07.554978Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-11-19T06:44:07.608246Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-11-19T06:44:08.265491Z  WARN text_embeddings_router: router/src/lib.rs:95: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2024-11-19T06:44:10.240145Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-11-19T06:44:10.485670Z  WARN text_embeddings_router: router/src/lib.rs:105: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2024-11-19T06:44:10.485706Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-11-19T06:44:10.485718Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-11-19T06:44:10.485883Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-11-19T06:44:10.485985Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:368: Downloading `model.onnx`
2024-11-19T06:44:10.791147Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:372: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/model.onnx)
2024-11-19T06:44:10.791193Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:373: Downloading `onnx/model.onnx`
thread 'main' panicked at /usr/src/backends/src/lib.rs:316:17:
failed to download `model.onnx` or `model.onnx_data`. Check the onnx file exists in the repository. request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/onnx/model.onnx)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Some BGEs don't feature ONNX models as the message suggests. You need to download the model and generate one yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants