Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin arm64 regression trying "Example: Build on mac", llama-cpp 'computeFunction must not be nil' #4274

Open
mintyleaf opened this issue Nov 27, 2024 · 0 comments
Labels
bug Something isn't working unconfirmed

Comments

@mintyleaf
Copy link
Contributor

LocalAI version:
e8128a3

Environment, CPU architecture, OS, and Version:
m3 air mac

Describe the bug
trying to build and load phi-2.Q2_K model is failing for all backends
doing exactly the same on latest stable tag v2.23.0 successfully loads the model with the first llama-cpp backend and working as intended

To Reproduce
build and run the example for mac on arm64 mac

Logs

...
4:30AM INF [llama-cpp] Attempting to load
4:30AM INF Loading model 'Phi2' with backend llama-cpp
4:30AM DBG Loading model in memory from file: /Users/mintyleaf/Projects/work/LocalAI/models/phi-2.Q2_K
4:30AM DBG Loading Model Phi2 with gRPC (file: /Users/mintyleaf/Projects/work/LocalAI/models/phi-2.Q2_K) (backend: llama-cpp): {backendString:llama-cpp model:phi-2.Q2_K modelID:Phi2 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0x14000468f08 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
4:30AM DBG [llama-cpp-fallback] llama-cpp variant available
4:30AM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama-cpp-fallback
4:30AM DBG GRPC Service for Phi2 will be running at: '127.0.0.1:58586'
4:30AM DBG GRPC Service state dir: /var/folders/d7/46zkm5yj39nbb6dp9dtrs_d00000gn/T/go-processmanager4284942303
4:30AM DBG GRPC Service Started
4:30AM DBG Wait for the service to start up
4:30AM DBG GRPC(Phi2-127.0.0.1:58586): stdout Server listening on 127.0.0.1:58586
4:31AM DBG GRPC Service Ready
4:31AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:phi-2.Q2_K ContextSize:512 Seed:2019972718 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/Users/mintyleaf/Projects/work/LocalAI/models/phi-2.Q2_K Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/Users/mintyleaf/Projects/work/LocalAI/models LoraAdapters:[] LoraScales:[]}
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_load_model_from_file: using device Metal (Apple M3) - 5461 MiB free
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /Users/mintyleaf/Projects/work/LocalAI/models/phi-2.Q2_K (version GGUF V3 (latest))
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   0:                       general.architecture str              = phi2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   1:                               general.name str              = Phi2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   2:                        phi2.context_length u32              = 2048
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   3:                      phi2.embedding_length u32              = 2560
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   4:                   phi2.feed_forward_length u32              = 10240
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   5:                           phi2.block_count u32              = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   6:                  phi2.attention.head_count u32              = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   7:               phi2.attention.head_count_kv u32              = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   8:          phi2.attention.layer_norm_epsilon f32              = 0.000010
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv   9:                  phi2.rope.dimension_count u32              = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  10:                          general.file_type u32              = 10
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  11:               tokenizer.ggml.add_bos_token bool             = false
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = gpt2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,51200]   = ["!", "\"", "#", "$", "%", "&", "'", ...
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,51200]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,50000]   = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 50256
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 50256
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 50256
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - kv  19:               general.quantization_version u32              = 2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - type  f32:  195 tensors
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - type q2_K:   33 tensors
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - type q3_K:   96 tensors
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_model_loader: - type q6_K:    1 tensors
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: missing pre-tokenizer type, using: 'default'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab:                                             
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: ************************************        
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!        
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: CONSIDER REGENERATING THE MODEL             
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: ************************************        
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab:                                             
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: special tokens cache size = 944
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_vocab: token to piece cache size = 0.3151 MB
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: format           = GGUF V3 (latest)
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: arch             = phi2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: vocab type       = BPE
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_vocab          = 51200
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_merges         = 50000
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: vocab_only       = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_ctx_train      = 2048
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_embd           = 2560
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_layer          = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_head           = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_head_kv        = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_rot            = 32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_swa            = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_embd_head_k    = 80
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_embd_head_v    = 80
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_gqa            = 1
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_embd_k_gqa     = 2560
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_embd_v_gqa     = 2560
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: f_norm_eps       = 1.0e-05
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: f_clamp_kqv      = 0.0e+00
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: f_logit_scale    = 0.0e+00
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_ff             = 10240
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_expert         = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_expert_used    = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: causal attn      = 1
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: pooling type     = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: rope type        = 2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: rope scaling     = linear
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: freq_base_train  = 10000.0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: freq_scale_train = 1
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: n_ctx_orig_yarn  = 2048
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: rope_finetuned   = unknown
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: ssm_d_conv       = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: ssm_d_inner      = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: ssm_d_state      = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: ssm_dt_rank      = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: ssm_dt_b_c_rms   = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: model type       = 3B
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: model ftype      = Q2_K - Medium
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: model params     = 2.78 B
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: model size       = 1.09 GiB (3.37 BPW) 
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: general.name     = Phi2
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: BOS token        = 50256 '<|endoftext|>'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: EOS token        = 50256 '<|endoftext|>'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: EOT token        = 50256 '<|endoftext|>'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: UNK token        = 50256 '<|endoftext|>'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: LF token         = 128 'Ä'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: EOG token        = 50256 '<|endoftext|>'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_print_meta: max token length = 256
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors: tensor 'token_embd.weight' (q2_K) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_backend_metal_log_allocated_size: allocated buffer, size =  1076.52 MiB, ( 1076.59 /  5461.34)
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors: offloading 32 repeating layers to GPU
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors: offloading output layer to GPU
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors: offloaded 33/33 layers to GPU
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors: Metal_Mapped model buffer size =  1076.51 MiB
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llm_load_tensors:   CPU_Mapped model buffer size =    41.02 MiB
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr .........................................................................................
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_seq_max     = 1
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_ctx         = 512
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_ctx_per_seq = 512
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_batch       = 512
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_ubatch      = 512
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: flash_attn    = 0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: freq_base     = 10000.0
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: freq_scale    = 1
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr llama_new_context_with_model: n_ctx_per_seq (512) < n_ctx_train (2048) -- the full capacity of the model will not be utilized
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: allocating
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: found device: Apple M3
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: picking default device: Apple M3
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loading '/private/tmp/localai/backend_data/backend-assets/grpc/default.metallib'
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: GPU name:   Apple M3
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: GPU family: MTLGPUFamilyApple9  (1009)
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: GPU family: MTLGPUFamilyMetal3  (5001)
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: simdgroup reduction   = true
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: simdgroup matrix mul. = true
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: has bfloat            = true
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: use bfloat            = false
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: hasUnifiedMemory      = true
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: recommendedMaxWorkingSetSize  =  5726.63 MB
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_add                                    0x15660c0f0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_add_row                                0x15660cd00 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_sub                                    0x15660cfc0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_sub_row                                0x15660db30 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_mul                                    0x15660ddf0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_mul_row                                0x1578058d0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_div                                    0x157805b90 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_div_row                                0x157806800 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_repeat_f32                             0x157806ac0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_repeat_f16                             0x157806d80 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_repeat_i32                             0x1578071f0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_repeat_i16                             0x1578077d0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_scale                                  0x1578083e0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_scale_4                                0x157808b90 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_clamp                                  0x1578093a0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_tanh                                   0x157809ac0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_relu                                   0x15780a1e0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_sigmoid                                0x15780a900 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_gelu                                   0x15780b020 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_gelu_4                                 0x15780b740 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_gelu_quick                             0x15780be60 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_gelu_quick_4                           0x15780c580 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_silu                                   0x15780cca0 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr ggml_metal_init: loaded kernel_silu_4                                 0x15780d540 | th_max = 1024 | th_width =   32
4:31AM DBG GRPC(Phi2-127.0.0.1:58586): stderr -[MTLComputePipelineDescriptorInternal setComputeFunction:withType:]:800: failed assertion `computeFunction must not be nil.'
4:31AM ERR [llama-cpp] Failed loading model, trying with fallback 'llama-cpp-fallback', error: failed to load model with internal loader: could not load model: rpc error: code = Unavailable desc = error reading from server: EOF
...
@mintyleaf mintyleaf added bug Something isn't working unconfirmed labels Nov 27, 2024
@mintyleaf mintyleaf changed the title arm64 regression trying "Example: Build on mac", llama-cpp 'computeFunction must not be nil' darwin arm64 regression trying "Example: Build on mac", llama-cpp 'computeFunction must not be nil' Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

1 participant