We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-cli --version version: 4329 (89d604f) built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
Mac
Metal
MacBookPro M1 Max
Just run:
llama-qwen2vl-cli -m models/qwen2-vl-72b-instruct-q4_k_m.gguf --mmproj models/qwen2-vl-72b-instruct.f32.mmproj.gguf --image demos/images/06.png --temp 0 -p "describe the image in detail."
No response
llama-qwen2vl-cli -m models/qwen2-vl-72b-instruct-q4_k_m.gguf --mmproj models/qwen2-vl-72b-instruct.f32.mmproj.gguf --image demos/images/06.png --temp 0 -p "describe the image in detail." -ngl 0 build: 4329 (89d604f2) with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0 llama_load_model_from_file: using device Metal (Apple M1 Max) - 57343 MiB free llama_model_loader: loaded meta data with 39 key-value pairs and 963 tensors from models/qwen2-vl-72b-instruct-q4_k_m.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2vl llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2 VL 72B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2-VL llama_model_loader: - kv 5: general.size_label str = 72B llama_model_loader: - kv 6: general.license str = other llama_model_loader: - kv 7: general.license.name str = tongyi-qianwen llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen2-VL-... llama_model_loader: - kv 9: general.base_model.count u32 = 1 llama_model_loader: - kv 10: general.base_model.0.name str = Qwen2 VL 72B llama_model_loader: - kv 11: general.base_model.0.organization str = Qwen llama_model_loader: - kv 12: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2-VL-72B llama_model_loader: - kv 13: general.tags arr[str,2] = ["multimodal", "image-text-to-text"] llama_model_loader: - kv 14: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 15: qwen2vl.block_count u32 = 80 llama_model_loader: - kv 16: qwen2vl.context_length u32 = 32768 llama_model_loader: - kv 17: qwen2vl.embedding_length u32 = 8192 llama_model_loader: - kv 18: qwen2vl.feed_forward_length u32 = 29568 llama_model_loader: - kv 19: qwen2vl.attention.head_count u32 = 64 llama_model_loader: - kv 20: qwen2vl.attention.head_count_kv u32 = 8 llama_model_loader: - kv 21: qwen2vl.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 22: qwen2vl.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 23: general.file_type u32 = 15 llama_model_loader: - kv 24: qwen2vl.rope.dimension_sections arr[i32,4] = [16, 24, 24, 0] llama_model_loader: - kv 25: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 26: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 27: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 28: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 29: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 30: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 32: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 33: tokenizer.chat_template str = {% set image_count = namespace(value=... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: quantize.imatrix.file str = /models_out/Qwen2-VL-72B-Instruct-GGU... llama_model_loader: - kv 36: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt llama_model_loader: - kv 37: quantize.imatrix.entries_count i32 = 560 llama_model_loader: - kv 38: quantize.imatrix.chunks_count i32 = 128 llama_model_loader: - type f32: 401 tensors llama_model_loader: - type q5_0: 40 tensors llama_model_loader: - type q8_0: 40 tensors llama_model_loader: - type q4_K: 401 tensors llama_model_loader: - type q5_K: 40 tensors llama_model_loader: - type q6_K: 41 tensors llm_load_vocab: special tokens cache size = 14 llm_load_vocab: token to piece cache size = 0.9309 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = qwen2vl llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 152064 llm_load_print_meta: n_merges = 151387 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 29568 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 8 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_K - Medium llm_load_print_meta: model params = 72.71 B llm_load_print_meta: model size = 44.15 GiB (5.22 BPW) llm_load_print_meta: general.name = Qwen2 VL 72B Instruct llm_load_print_meta: BOS token = 151643 '<|endoftext|>' llm_load_print_meta: EOS token = 151645 '<|im_end|>' llm_load_print_meta: EOT token = 151645 '<|im_end|>' llm_load_print_meta: PAD token = 151643 '<|endoftext|>' llm_load_print_meta: LF token = 148848 'ÄĬ' llm_load_print_meta: EOG token = 151643 '<|endoftext|>' llm_load_print_meta: EOG token = 151645 '<|im_end|>' llm_load_print_meta: max token length = 256 llm_load_tensors: offloading 0 repeating layers to GPU llm_load_tensors: offloaded 0/81 layers to GPU llm_load_tensors: CPU_Mapped model buffer size = 45213.44 MiB ................................................................................................... clip_model_load: model name: Qwen2-VL-72B-Instruct clip_model_load: description: image encoder for Qwen2VL clip_model_load: GGUF version: 3 clip_model_load: alignment: 32 clip_model_load: n_tensors: 521 clip_model_load: n_kv: 20 clip_model_load: ftype: f32 clip_model_load: loaded meta data with 20 key-value pairs and 521 tensors from models/qwen2-vl-72b-instruct.f32.mmproj.gguf clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output. clip_model_load: - kv 0: general.architecture str = clip clip_model_load: - kv 1: general.description str = image encoder for Qwen2VL clip_model_load: - kv 2: general.file_type u32 = 0 clip_model_load: - kv 3: clip.has_text_encoder bool = false clip_model_load: - kv 4: clip.has_vision_encoder bool = true clip_model_load: - kv 5: clip.has_qwen2vl_merger bool = true clip_model_load: - kv 6: clip.projector_type str = qwen2vl_merger clip_model_load: - kv 7: clip.use_silu bool = false clip_model_load: - kv 8: clip.use_gelu bool = false clip_model_load: - kv 9: clip.vision.patch_size u32 = 14 clip_model_load: - kv 10: clip.vision.image_size u32 = 560 clip_model_load: - kv 11: clip.vision.embedding_length u32 = 1280 clip_model_load: - kv 12: clip.vision.projection_dim u32 = 8192 clip_model_load: - kv 13: clip.vision.attention.head_count u32 = 16 clip_model_load: - kv 14: clip.vision.attention.layer_norm_epsilon f32 = 0.000001 clip_model_load: - kv 15: clip.vision.block_count u32 = 32 clip_model_load: - kv 16: clip.vision.feed_forward_length u32 = 0 clip_model_load: - kv 17: general.name str = Qwen2-VL-72B-Instruct clip_model_load: - kv 18: clip.vision.image_mean arr[f32,3] = [0.481455, 0.457828, 0.408211] clip_model_load: - kv 19: clip.vision.image_std arr[f32,3] = [0.268630, 0.261303, 0.275777] clip_model_load: - type f32: 521 tensors ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Max ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: using embedded metal library ggml_metal_init: GPU name: Apple M1 Max ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction = true ggml_metal_init: simdgroup matrix mul. = true ggml_metal_init: has bfloat = true ggml_metal_init: use bfloat = false ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 60129.54 MB ggml_metal_init: skipping kernel_get_rows_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported) ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported) clip_model_load: CLIP using Metal backend clip_model_load: text_encoder: 0 clip_model_load: vision_encoder: 1 clip_model_load: llava_projector: 0 clip_model_load: minicpmv_projector: 0 clip_model_load: model size: 2667.83 MB clip_model_load: metadata size: 0.18 MB clip_model_load: params backend buffer size = 2667.83 MB (521 tensors) key clip.vision.image_grid_pinpoints not found in file key clip.vision.mm_patch_merge_type not found in file key clip.vision.image_crop_resolution not found in file clip_model_load: compute allocated memory: 198.93 MB llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_ctx_per_seq = 4096 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized ggml_metal_init: allocating ggml_metal_init: found device: Apple M1 Max ggml_metal_init: picking default device: Apple M1 Max ggml_metal_init: using embedded metal library ggml_metal_init: GPU name: Apple M1 Max ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_init: simdgroup reduction = true ggml_metal_init: simdgroup matrix mul. = true ggml_metal_init: has bfloat = true ggml_metal_init: use bfloat = false ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 60129.54 MB ggml_metal_init: skipping kernel_get_rows_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported) ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported) ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported) ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported) ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported) ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported) ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported) llama_kv_cache_init: CPU KV buffer size = 1280.00 MiB llama_new_context_with_model: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB llama_new_context_with_model: CPU output buffer size = 0.58 MiB llama_new_context_with_model: CPU compute buffer size = 584.01 MiB llama_new_context_with_model: graph nodes = 2806 llama_new_context_with_model: graph splits = 1282 (with bs=512), 1 (with bs=1) /Users/zhang/Developer/llama.cpp/ggml/src/ggml-metal/ggml-metal.m:1263: unsupported op ggml_metal_encode_node: error: unsupported op 'IM2COL' zsh: abort llama-qwen2vl-cli -m models/qwen2-vl-72b-instruct-q4_k_m.gguf --mmproj 0
The text was updated successfully, but these errors were encountered:
+1 Same error on Macbook Pro M3 Max when trying to run Qwen2-VL-72b
Sorry, something went wrong.
No branches or pull requests
Name and Version
llama-cli --version
version: 4329 (89d604f)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
Operating systems
Mac
GGML backends
Metal
Hardware
MacBookPro M1 Max
Models
Problem description & steps to reproduce
Just run:
llama-qwen2vl-cli -m models/qwen2-vl-72b-instruct-q4_k_m.gguf --mmproj models/qwen2-vl-72b-instruct.f32.mmproj.gguf --image demos/images/06.png --temp 0 -p "describe the image in detail."
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: