Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Crashing on Low Memory SBC) main invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 #59

Closed
unclemusclez opened this issue May 19, 2024 · 51 comments

Comments

@unclemusclez
Copy link

unclemusclez commented May 19, 2024

Is there anyway that main and worker could be separated so I can use a cluster of 8 RPi 3b+ for the compute but the scheduling is offset to another device with more memory?
I understand this is most likely not a priority.
Perhaps a smaller model? https://github.com/jzhang38/TinyLlama ?

main:

ubuntu@ubuntu:~/distributed-llama$ sudo main chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_meta-lla
ma-3-8b_q40.bin --tokenizer ~/dllama-llama3-tokenizer.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:
💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001
Killed

Worker

ubuntu@ubuntu:~$ sudo nice -n -20 main worker --port 9998 --nthreads 4]
Listening on 0.0.0.0:9998...
Client connected
terminate called after throwing an instance of 'ReadSocketException'
  what():  std::exception
Aborted
May 19 08:46:24 ubuntu kernel: [107061.602328] main invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
May 19 08:46:24 ubuntu kernel: [107061.602392] CPU: 0 PID: 4676 Comm: main Tainted: G         C  E     5.15.0-1055-raspi #58-Ubuntu
May 19 08:46:24 ubuntu kernel: [107061.602412] Hardware name: Raspberry Pi 3 Model B Plus Rev 1.3 (DT)
May 19 08:46:24 ubuntu kernel: [107061.602423] Call trace:
May 19 08:46:24 ubuntu kernel: [107061.602430]  dump_backtrace+0x0/0x200
May 19 08:46:24 ubuntu kernel: [107061.602455]  show_stack+0x20/0x30
May 19 08:46:24 ubuntu kernel: [107061.602470]  dump_stack_lvl+0x8c/0xb8
May 19 08:46:24 ubuntu kernel: [107061.602490]  dump_stack+0x18/0x34
May 19 08:46:24 ubuntu kernel: [107061.602506]  dump_header+0x54/0x21c
May 19 08:46:24 ubuntu kernel: [107061.602520]  oom_kill_process+0x22c/0x230
May 19 08:46:24 ubuntu kernel: [107061.602539]  out_of_memory+0xf4/0x370
May 19 08:46:24 ubuntu kernel: [107061.602554]  __alloc_pages_slowpath.constprop.0+0x604/0x8e0
May 19 08:46:24 ubuntu kernel: [107061.602574]  __alloc_pages+0x29c/0x320
May 19 08:46:24 ubuntu kernel: [107061.602590]  alloc_zeroed_user_highpage_movable+0x40/0x50
May 19 08:46:24 ubuntu kernel: [107061.602607]  do_anonymous_page+0x88/0x4ec
May 19 08:46:24 ubuntu kernel: [107061.602628]  handle_pte_fault+0x170/0x1c0
May 19 08:46:24 ubuntu kernel: [107061.602642]  __handle_mm_fault+0x1d0/0x350
May 19 08:46:24 ubuntu kernel: [107061.602655]  handle_mm_fault+0x108/0x294
May 19 08:46:24 ubuntu kernel: [107061.602669]  faultin_page+0x84/0x150
May 19 08:46:24 ubuntu kernel: [107061.602685]  __get_user_pages+0x194/0x2c0
May 19 08:46:24 ubuntu kernel: [107061.602701]  populate_vma_page_range+0x64/0x70
May 19 08:46:24 ubuntu kernel: [107061.602719]  __mm_populate+0xc4/0x1d0
May 19 08:46:24 ubuntu kernel: [107061.602735]  do_mlock+0xdc/0x26c
May 19 08:46:24 ubuntu kernel: [107061.602750]  __arm64_sys_mlock+0x20/0x30
May 19 08:46:24 ubuntu kernel: [107061.602765]  invoke_syscall+0x50/0x120
May 19 08:46:24 ubuntu kernel: [107061.602784]  el0_svc_common.constprop.0+0x6c/0x1a0
May 19 08:46:24 ubuntu kernel: [107061.602803]  do_el0_svc+0x30/0xb0
May 19 08:46:24 ubuntu kernel: [107061.602820]  el0_svc+0x4c/0x170
May 19 08:46:24 ubuntu kernel: [107061.602837]  el0t_64_sync_handler+0xa4/0x130
May 19 08:46:24 ubuntu kernel: [107061.602854]  el0t_64_sync+0x1a4/0x1a8
May 19 08:46:24 ubuntu kernel: [107061.602888] Mem-Info:
May 19 08:46:24 ubuntu kernel: [107061.602905] active_anon:735 inactive_anon:16569 isolated_anon:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  active_file:36 inactive_file:28 isolated_file:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  unevictable:185356 dirty:0 writeback:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  slab_reclaimable:6070 slab_unreclaimable:10550
May 19 08:46:24 ubuntu kernel: [107061.602905]  mapped:1869 shmem:749 pagetables:923 bounce:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  kernel_misc_reclaimable:0
May 19 08:46:24 ubuntu kernel: [107061.602905]  free:5609 free_pcp:0 free_cma:0
May 19 08:46:24 ubuntu kernel: [107061.602949] Node 0 active_anon:2940kB inactive_anon:66276kB active_file:144kB inactive_file:112kB unevictable:741424kB isolated(anon):0kB isolated(file):0kB mapped:7476kB dirty:0kB writeback:0kB shmem:2996kB >May 19 08:46:24 ubuntu kernel: [107061.602992] DMA free:22436kB min:24576kB low:30208kB high:35840kB reserved_highatomic:0KB active_anon:2940kB inactive_anon:66276kB active_file:196kB inactive_file:292kB unevictable:741332kB writepending:0kB p>May 19 08:46:24 ubuntu kernel: [107061.603035] lowmem_reserve[]: 0 0 0 0
May 19 08:46:24 ubuntu kernel: [107061.603114] DMA: 1113*4kB (UME) 633*8kB (UME) 296*16kB (UME) 129*32kB (UME) 48*64kB (UME) 11*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22860kB
May 19 08:46:24 ubuntu kernel: [107061.603406] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
May 19 08:46:24 ubuntu kernel: [107061.603428] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
May 19 08:46:24 ubuntu kernel: [107061.603449] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
May 19 08:46:24 ubuntu kernel: [107061.603469] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
May 19 08:46:24 ubuntu kernel: [107061.603489] 2704 total pagecache pages
May 19 08:46:24 ubuntu kernel: [107061.603504] 0 pages in swap cache
May 19 08:46:24 ubuntu kernel: [107061.603518] Swap cache stats: add 0, delete 0, find 0/0
May 19 08:46:24 ubuntu kernel: [107061.603536] Free swap  = 0kB
May 19 08:46:24 ubuntu kernel: [107061.603550] Total swap = 0kB
May 19 08:46:24 ubuntu kernel: [107061.603565] 242688 pages RAM
May 19 08:46:24 ubuntu kernel: [107061.603580] 0 pages HighMem/MovableOnly
May 19 08:46:24 ubuntu kernel: [107061.603594] 10931 pages reserved
May 19 08:46:24 ubuntu kernel: [107061.603609] 16384 pages cma reserved
May 19 08:46:24 ubuntu kernel: [107061.603624] Tasks state (memory values in pages):
May 19 08:46:24 ubuntu kernel: [107061.603638] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
May 19 08:46:24 ubuntu kernel: [107061.603685] [    379]     0   379    12038      852    94208        0          -250 systemd-journal
May 19 08:46:24 ubuntu kernel: [107061.603716] [    406]     0   406    72414     6415   118784        0         -1000 multipathd
May 19 08:46:24 ubuntu kernel: [107061.603745] [    420]     0   420     5982      942    69632        0         -1000 systemd-udevd
May 19 08:46:24 ubuntu kernel: [107061.603789] [    553]   103   553    22163      732    77824        0             0 systemd-timesyn
May 19 08:46:24 ubuntu kernel: [107061.603819] [    612]   100   612     4068      777    73728        0             0 systemd-network
May 19 08:46:24 ubuntu kernel: [107061.603847] [    614]   101   614     6339     1633    90112        0             0 systemd-resolve
May 19 08:46:24 ubuntu kernel: [107061.603875] [    625]   102   625     2267      838    57344        0          -900 dbus-daemon
May 19 08:46:24 ubuntu kernel: [107061.603904] [    629]     0   629    20487      611    65536        0             0 irqbalance
May 19 08:46:24 ubuntu kernel: [107061.603933] [    634]     0   634     8236     2733   114688        0             0 networkd-dispat
May 19 08:46:24 ubuntu kernel: [107061.603961] [    640]   104   640    55504      826    81920        0             0 rsyslogd
May 19 08:46:24 ubuntu kernel: [107061.603989] [    644]     0   644   366640     2855   249856        0          -900 snapd
May 19 08:46:24 ubuntu kernel: [107061.604017] [    653]     0   653     3887      791    69632        0             0 systemd-logind
May 19 08:46:24 ubuntu kernel: [107061.604045] [    655]     0   655     3809      626    73728        0             0 wpa_supplicant
May 19 08:46:24 ubuntu kernel: [107061.604073] [    683]     0   683     1727      501    45056        0             0 cron
May 19 08:46:24 ubuntu kernel: [107061.604100] [    703]     0   703    27482     2589   110592        0             0 unattended-upgr
May 19 08:46:24 ubuntu kernel: [107061.604128] [    710]     0   710     1408      126    53248        0             0 agetty
May 19 08:46:24 ubuntu kernel: [107061.604155] [    712]     0   712     1397      139    49152        0             0 agetty
May 19 08:46:24 ubuntu kernel: [107061.604183] [    720]     0   720     3788     1039    69632        0         -1000 sshd
May 19 08:46:24 ubuntu kernel: [107061.604211] [    844]     0   844      559       44    36864        0             0 hciattach
May 19 08:46:24 ubuntu kernel: [107061.604239] [    856]     0   856     2384      602    61440        0             0 bluetoothd
May 19 08:46:24 ubuntu kernel: [107061.604266] [   1172]     0  1172    74368     1369   167936        0             0 packagekitd
May 19 08:46:24 ubuntu kernel: [107061.604305] [   1178]     0  1178    58582      814    94208        0             0 polkitd
May 19 08:46:24 ubuntu kernel: [107061.604336] [   4481]     0  4481     4596     1078    81920        0             0 sshd
May 19 08:46:24 ubuntu kernel: [107061.604364] [   4484]  1000  4484     4559     1187    73728        0             0 systemd
May 19 08:46:24 ubuntu kernel: [107061.604391] [   4485]  1000  4485    42829     1235   110592        0             0 (sd-pam)
May 19 08:46:24 ubuntu kernel: [107061.604421] [   4571]  1000  4571     4631      881    81920        0             0 sshd
May 19 08:46:24 ubuntu kernel: [107061.604448] [   4572]  1000  4572     2147      846    53248        0             0 bash
May 19 08:46:24 ubuntu kernel: [107061.604481] [   4674]  1000  4674     3345      616    61440        0             0 sudo
May 19 08:46:24 ubuntu kernel: [107061.604509] [   4675]  1000  4675     3345      172    61440        0             0 sudo
May 19 08:46:24 ubuntu kernel: [107061.604536] [   4676]     0  4676  1725546   180701  1495040        0             0 main
May 19 08:46:24 ubuntu kernel: [107061.604563] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-39.scope,task=main,pid=4676,uid=0
May 19 08:46:24 ubuntu kernel: [107061.604827] Out of memory: Killed process 4676 (main) total-vm:6902184kB, anon-rss:721280kB, file-rss:1524kB, shmem-rss:0kB, UID:0 pgtables:1460kB oom_score_adj:0
May 19 08:46:25 ubuntu systemd[1]: session-39.scope: A process of this unit has been killed by the OOM killer.
@b4rtaz
Copy link
Owner

b4rtaz commented May 22, 2024

I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.

@unclemusclez
Copy link
Author

I think a smaller model is a way to go for RasPi 3. The converter needs to be adjusted a bit and it should work. I'll look at it soon.

ballin

@unclemusclez
Copy link
Author

unclemusclez commented May 23, 2024

apparently i should be able to use llama.cpp and mpi with rpi3b+.
i assume dllama will offer some optimization? maybe i should just explore mpi for now?
https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/

@zhengpeirong
Copy link

zhengpeirong commented May 23, 2024

apparently i should be able to use llama.cpp and mpi with rpi3b+. i assume dllama will offer some optimization? maybe i should just explore mpi for now? https://blog.cineneural.com/blog/2023-07/run-llama-llm-on-raspberry-pi-cluster/

The llama.cpp uses pipeline parallel, which produces high throughput only when the batch size is large. Moreover, the MPI backend is broken after a certain commit. That's why we are here.

@unclemusclez
Copy link
Author

unclemusclez commented May 23, 2024

alright good. i think that means i'm in the right place. i will be testing this SBC devices mostly, but frequently, if i can manage to get a database to load.

when discord?

@b4rtaz
Copy link
Owner

b4rtaz commented May 23, 2024

The first version of a general HF converter is here. You can try it. So far I tested it only with TinyLlama-1.1B:

  1. Download Tiny Llama
  2. Run the converter of the model: python3 convert-hf.py path/to/TinyLlama-1.1B q40 tinylama
  3. Run the converter of the tokenizer: python3 convert-tokenizer-sentencepiece.py path/to/tokenizer.model tinyllama
  4. Run the Distributed Llama:
b4rtaz@b4rtazs-MacBook-Pro distributed-llama % ./dllama generate --weights-float-type q40 --buffer-float-type q80 --nthreads 8 --steps 128 --model ../dllama_tinylama_q40.bin --tokenizer ../dllama_tinyllama.t --prompt "My name is Clara"
💡 arch: llama2
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 16384 kB
⏩ Loaded 824584 kB
My name is Clara. I am not your enemy. I just want to make sure that you and the world know that you are loved, and you are never alone.
[Page 215]
I feel a little more confident about him than I did a few hours ago. We have a lot of time together. He has all of his classes and other things to do, and he is at least a little used to me. It is probably safer for him to be here with me, and I am much more comfortable with him here.
I feel like I could ask him anything. He is not scared
Generated tokens:    128
Avg tokens / second: 47.23
Avg generation time: 21.17 ms
Avg inference time:  20.45 ms
Avg transfer time:   0.45 ms

@unclemusclez
Copy link
Author

k brb

@unclemusclez
Copy link
Author

seems like no dice?

~/distributed-llama-hf/converter$ python convert-hf.py ../../TinyLlama-1.1B-intermediate-step-1431k-3T q40 tinylama
Output file: dllama_model_tinylama_q40.m
Unknown header key: files
{'version': 0, 'arch_type': 11259136, 'hidden_act': 1, 'dim': 2048, 'hidden_dim': 5632, 'n_layers': 22, 'n_heads': 32, 'n_kv_heads': 4, 'weights_float_type': 2, 'max_seq_len': 2048, 'vocab_size': 32000, 'files': ['../../TinyLlama-1.1B-intermediate-step-1431k-3T/model.safetensors'], 'n_experts': 0, 'n_active_experts': 0}
💿 Loading file model.safetensors...
Found 201 layers
🔶 Writing tensor model.embed_tokens.weight torch.Size([32000, 2048])...
Saved f32 tensor in 2.61s, 262144000 bytes
🔶 Writing tensor model.layers.0.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.0.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.0.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.0.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
🔶 Writing tensor model.layers.0.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
🔶 Writing tensor model.layers.0.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.0.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.0.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.0.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.1.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.1.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.1.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.1.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.1.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
🔶 Writing tensor model.layers.1.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.23s, 6488064 bytes
🔶 Writing tensor model.layers.1.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
🔶 Writing tensor model.layers.1.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.1.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.2.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.2.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.2.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.2.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.2.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
🔶 Writing tensor model.layers.2.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
🔶 Writing tensor model.layers.2.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.2.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.2.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.3.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.3.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.3.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.3.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.3.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.3.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
🔶 Writing tensor model.layers.3.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
🔶 Writing tensor model.layers.3.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.3.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.4.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.4.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.4.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.4.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.4.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.30s, 6488064 bytes
🔶 Writing tensor model.layers.4.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.4.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
🔶 Writing tensor model.layers.4.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.4.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.5.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.5.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.5.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.5.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.5.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.5.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.5.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.5.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.5.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.6.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.6.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.6.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.6.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.6.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.6.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.6.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.6.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.6.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.7.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.7.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.7.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.07s, 294912 bytes
🔶 Writing tensor model.layers.7.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.7.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
🔶 Writing tensor model.layers.7.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
🔶 Writing tensor model.layers.7.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.7.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.7.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.8.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.8.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.8.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.8.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.8.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.8.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.22s, 6488064 bytes
🔶 Writing tensor model.layers.8.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
🔶 Writing tensor model.layers.8.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.8.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.9.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.9.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.9.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.9.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.50s, 2359296 bytes
🔶 Writing tensor model.layers.9.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.22s, 6488064 bytes
🔶 Writing tensor model.layers.9.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.9.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.9.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.9.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.10.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.10.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.10.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.10.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.10.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.10.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.10.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.23s, 6488064 bytes
🔶 Writing tensor model.layers.10.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.10.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.11.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.11.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.11.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.11.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.11.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.11.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.11.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.11.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.11.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.12.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.12.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.12.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.05s, 294912 bytes
🔶 Writing tensor model.layers.12.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
🔶 Writing tensor model.layers.12.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.12.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.12.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.12.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.12.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.13.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.13.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.13.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.13.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.13.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
🔶 Writing tensor model.layers.13.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.13.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.13.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.13.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.14.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.14.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.14.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.14.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
🔶 Writing tensor model.layers.14.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.14.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.14.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.14.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.14.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.15.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.15.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.15.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.15.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.15.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.24s, 6488064 bytes
🔶 Writing tensor model.layers.15.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.29s, 6488064 bytes
🔶 Writing tensor model.layers.15.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.15.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.15.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.16.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.16.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.16.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.16.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.46s, 2359296 bytes
🔶 Writing tensor model.layers.16.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.16.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.16.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.28s, 6488064 bytes
🔶 Writing tensor model.layers.16.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.16.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.17.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.17.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.17.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.17.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.44s, 2359296 bytes
🔶 Writing tensor model.layers.17.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.32s, 6488064 bytes
🔶 Writing tensor model.layers.17.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.30s, 6488064 bytes
🔶 Writing tensor model.layers.17.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.17.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.17.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.18.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
🔶 Writing tensor model.layers.18.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.18.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.18.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.18.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.18.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.18.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.18.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.18.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.19.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.19.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.19.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.19.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.47s, 2359296 bytes
🔶 Writing tensor model.layers.19.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.35s, 6488064 bytes
🔶 Writing tensor model.layers.19.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.19.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.31s, 6488064 bytes
🔶 Writing tensor model.layers.19.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.19.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.20.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
🔶 Writing tensor model.layers.20.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.20.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.20.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.48s, 2359296 bytes
🔶 Writing tensor model.layers.20.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.29s, 6488064 bytes
🔶 Writing tensor model.layers.20.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.28s, 6488064 bytes
🔶 Writing tensor model.layers.20.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.20.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.20.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.21.self_attn.q_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.49s, 2359296 bytes
🔶 Writing tensor model.layers.21.self_attn.k_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.21.self_attn.v_proj.weight torch.Size([256, 2048])...
Saved q40 tensor in 0.06s, 294912 bytes
🔶 Writing tensor model.layers.21.self_attn.o_proj.weight torch.Size([2048, 2048])...
Saved q40 tensor in 0.45s, 2359296 bytes
🔶 Writing tensor model.layers.21.mlp.gate_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.25s, 6488064 bytes
🔶 Writing tensor model.layers.21.mlp.down_proj.weight torch.Size([2048, 5632])...
Saved q40 tensor in 1.27s, 6488064 bytes
🔶 Writing tensor model.layers.21.mlp.up_proj.weight torch.Size([5632, 2048])...
Saved q40 tensor in 1.26s, 6488064 bytes
🔶 Writing tensor model.layers.21.input_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.layers.21.post_attention_layernorm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor model.norm.weight torch.Size([2048])...
Saved f32 tensor in 0.00s, 8192 bytes
🔶 Writing tensor lm_head.weight torch.Size([32000, 2048])...
Saved q40 tensor in 7.50s, 36864000 bytes
✅ dllama_model_tinylama_q40.m created successfully

This console message got cut off:

当 -30689.0
Ë -30690.0
★ -30691.0
寺 -30692.0
性 -30693.0
也 -30694.0
め -30695.0
だ -30696.0
位 -30697.0
ങ -30698.0
ہ -30699.0
值 -30700.0
古 -30701.0
გ -30702.0
ব -30703.0
院 -30704.0
േ -30705.0
▶ -30706.0
ர -30707.0
界 -30708.0
語 -30709.0
സ -30710.0
수 -30711.0
ǒ -30712.0
愛 -30713.0
✔ -30714.0
時 -30715.0
ọ -30716.0
റ -30717.0
մ -30718.0
ケ -30719.0
东 -30720.0
同 -30721.0
주 -30722.0
保 -30723.0
Õ -30724.0
ố -30725.0
ἰ -30726.0
青 -30727.0
ゴ -30728.0
体 -30729.0
清 -30730.0
相 -30731.0
จ -30732.0
ء -30733.0
情 -30734.0
𝕜 -30735.0
ক -30736.0
ḫ -30737.0
ờ -30738.0
将 -30739.0
族 -30740.0
동 -30741.0
Υ -30742.0
┌ -30743.0
ボ -30744.0
宮 -30745.0
』 -30746.0
ম -30747.0
『 -30748.0
ļ -30749.0
श -30750.0
ป -30751.0
Ա -30752.0
ब -30753.0
자 -30754.0
政 -30755.0
ா -30756.0
间 -30757.0
fi -30758.0
松 -30759.0
ṃ -30760.0
始 -30761.0
息 -30762.0
少 -30763.0
教 -30764.0
获 -30765.0
列 -30766.0
开 -30767.0
ტ -30768.0
ワ -30769.0
კ -30770.0
科 -30771.0
春 -30772.0
治 -30773.0
吉 -30774.0
ས -30775.0
ศ -30776.0
ɒ -30777.0
台 -30778.0
ネ -30779.0
း -30780.0
ĩ -30781.0
工 -30782.0
ά -30783.0
知 -30784.0
八 -30785.0
場 -30786.0
画 -30787.0
百 -30788.0
☆ -30789.0
記 -30790.0
得 -30791.0
ソ -30792.0
氏 -30793.0
ာ -30794.0
에 -30795.0
ল -30796.0
ṛ -30797.0
关 -30798.0
ġ -30799.0
έ -30800.0
∑ -30801.0
ベ -30802.0
标 -30803.0
니 -30804.0
ὴ -30805.0
ֵ -30806.0
外 -30807.0
♠ -30808.0
わ -30809.0
間 -30810.0
ภ -30811.0
校 -30812.0
制 -30813.0
แ -30814.0
力 -30815.0
門 -30816.0
好 -30817.0
ғ -30818.0
Ù -30819.0
ℓ -30820.0
ֶ -30821.0
는 -30822.0
┐ -30823.0
∗ -30824.0
指 -30825.0
色 -30826.0
返 -30827.0
馬 -30828.0
请 -30829.0
≫ -30830.0
風 -30831.0
ό -30832.0
接 -30833.0
서 -30834.0
↳ -30835.0
せ -30836.0
志 -30837.0
̲ -30838.0
魔 -30839.0
ң -30840.0
更 -30841.0
程 -30842.0
김 -30843.0
郡 -30844.0
ོ -30845.0
ũ -30846.0
ച -30847.0
利 -30848.0
県 -30849.0
周 -30850.0
そ -30851.0
や -30852.0
谷 -30853.0
香 -30854.0
♯ -30855.0
じ -30856.0
، -30857.0
期 -30858.0
∅ -30859.0
┘ -30860.0
初 -30861.0
福 -30862.0
片 -30863.0
ザ -30864.0
動 -30865.0
参 -30866.0
성 -30867.0
Ə -30868.0
╦ -30869.0
어 -30870.0
ხ -30871.0
義 -30872.0
च -30873.0
象 -30874.0
功 -30875.0
♂ -30876.0
도 -30877.0
고 -30878.0
过 -30879.0
վ -30880.0
皇 -30881.0
特 -30882.0
ậ -30883.0
长 -30884.0
英 -30885.0
ấ -30886.0
ണ -30887.0
Ъ -30888.0
স -30889.0
其 -30890.0
ত -30891.0
流 -30892.0
除 -30893.0
일 -30894.0
ু -30895.0
្ -30896.0
永 -30897.0
直 -30898.0
상 -30899.0
千 -30900.0
ắ -30901.0
館 -30902.0
Ť -30903.0
朝 -30904.0
ட -30905.0
ɣ -30906.0
单 -30907.0
ʀ -30908.0
格 -30909.0
德 -30910.0
전 -30911.0
☺ -30912.0
ピ -30913.0
歌 -30914.0
进 -30915.0
限 -30916.0
夫 -30917.0
트 -30918.0
⊢ -30919.0
園 -30920.0
量 -30921.0
土 -30922.0
放 -30923.0
码 -30924.0
等 -30925.0
系 -30926.0
∼ -30927.0
華 -30928.0
↵ -30929.0
소 -30930.0
常 -30931.0
否 -30932.0
見 -30933.0
源 -30934.0
ׁ -30935.0
实 -30936.0
博 -30937.0
라 -30938.0
원 -30939.0
보 -30940.0
⊕ -30941.0
解 -30942.0
〜 -30943.0
男 -30944.0
দ -30945.0
ポ -30946.0
ろ -30947.0
나 -30948.0
ག -30949.0
無 -30950.0
Û -30951.0
̥ -30952.0
ұ -30953.0
查 -30954.0
̣ -30955.0
╗ -30956.0
╩ -30957.0
条 -30958.0
য -30959.0
ὁ -30960.0
後 -30961.0
他 -30962.0
网 -30963.0
ல -30964.0
≃ -30965.0
화 -30966.0
ە -30967.0
阿 -30968.0
ေ -30969.0
户 -30970.0
∫ -30971.0
구 -30972.0
ར -30973.0
မ -30974.0
▸ -30975.0
լ -30976.0
○ -30977.0
命 -30978.0
就 -30979.0
龍 -30980.0
君 -30981.0
夏 -30982.0
 -30983.0
言 -30984.0
先 -30985.0
➜ -30986.0
შ -30987.0
ძ -30988.0
ਾ -30989.0
வ -30990.0
ど -30991.0
ヒ -30992.0
ไ -30993.0
ன -30994.0
ば -30995.0
ギ -30996.0
գ -30997.0
ἄ -30998.0
ヤ -30999.0
典 -31000.0
府 -31001.0
̄ -31002.0
신 -31003.0
组 -31004.0
改 -31005.0
ὲ -31006.0
华 -31007.0
与 -31008.0
调 -31009.0
╝ -31010.0
ヴ -31011.0
ქ -31012.0
由 -31013.0
修 -31014.0
學 -31015.0
♣ -31016.0
消 -31017.0
符 -31018.0
ʌ -31019.0
부 -31020.0
ớ -31021.0
‾ -31022.0
▲ -31023.0
录 -31024.0
ള -31025.0
연 -31026.0
을 -31027.0
ひ -31028.0
영 -31029.0
┤ -31030.0
已 -31031.0
陽 -31032.0
င -31033.0
국 -31034.0
容 -31035.0
未 -31036.0
宗 -31037.0
ᴇ -31038.0
び -31039.0
장 -31040.0
龙 -31041.0
් -31042.0
提 -31043.0
ĝ -31044.0
六 -31045.0
形 -31046.0
제 -31047.0
Հ -31048.0
伊 -31049.0
ϵ -31050.0
ข -31051.0
Ű -31052.0
ゃ -31053.0
火 -31054.0
Ṣ -31055.0
佐 -31056.0
⊥ -31057.0
̪ -31058.0
ứ -31059.0
□ -31060.0
结 -31061.0
九 -31062.0
雄 -31063.0
թ -31064.0
ា -31065.0
而 -31066.0
བ -31067.0
우 -31068.0
张 -31069.0
ट -31070.0
ष -31071.0
向 -31072.0
ῥ -31073.0
选 -31074.0
공 -31075.0
ゲ -31076.0
ʐ -31077.0
仁 -31078.0
堂 -31079.0
ך -31080.0
ု -31081.0
ἔ -31082.0
അ -31083.0
ề -31084.0
ད -31085.0
선 -31086.0
오 -31087.0
久 -31088.0
 -31089.0
义 -31090.0
अ -31091.0
╔ -31092.0
无 -31093.0

 -31094.0
은 -31095.0
ʷ -31096.0
那 -31097.0
線 -31098.0
务 -31099.0
基 -31100.0
属 -31101.0
配 -31102.0
미 -31103.0
軍 -31104.0
โ -31105.0
津 -31106.0
完 -31107.0
研 -31108.0
注 -31109.0
失 -31110.0
应 -31111.0
က -31112.0
╚ -31113.0
友 -31114.0
章 -31115.0
Ψ -31116.0
求 -31117.0
ण -31118.0
경 -31119.0
‬ -31120.0
भ -31121.0
们 -31122.0
模 -31123.0
需 -31124.0
ச -31125.0
電 -31126.0
প -31127.0
դ -31128.0
へ -31129.0
此 -31130.0
夜 -31131.0
或 -31132.0
橋 -31133.0
根 -31134.0
Ī -31135.0
玉 -31136.0
ู -31137.0
ṅ -31138.0
交 -31139.0
品 -31140.0
良 -31141.0
ང -31142.0
ォ -31143.0
则 -31144.0
開 -31145.0
Ζ -31146.0
문 -31147.0
被 -31148.0
조 -31149.0
株 -31150.0
记 -31151.0
會 -31152.0
经 -31153.0
ू -31154.0
ょ -31155.0
转 -31156.0
崎 -31157.0
마 -31158.0
⌘ -31159.0
比 -31160.0
造 -31161.0
ܐ -31162.0
ื -31163.0
没 -31164.0
现 -31165.0
七 -31166.0
Ά -31167.0
商 -31168.0
ை -31169.0
机 -31170.0
阳 -31171.0
ĉ -31172.0
角 -31173.0
站 -31174.0
բ -31175.0
해 -31176.0
及 -31177.0
ध -31178.0
術 -31179.0
认 -31180.0
 -31181.0
创 -31182.0
編 -31183.0
ղ -31184.0
ḩ -31185.0
伝 -31186.0
岡 -31187.0
ड -31188.0
ホ -31189.0
港 -31190.0
任 -31191.0
登 -31192.0
ི -31193.0
็ -31194.0
布 -31195.0
究 -31196.0
帝 -31197.0
여 -31198.0
산 -31199.0
န -31200.0
◦ -31201.0
密 -31202.0
变 -31203.0
序 -31204.0
♀ -31205.0
∣ -31206.0
计 -31207.0
曲 -31208.0
Ă -31209.0
ύ -31210.0
ʋ -31211.0
传 -31212.0
】 -31213.0
包 -31214.0
意 -31215.0
去 -31216.0
沙 -31217.0
⸮ -31218.0
【 -31219.0
写 -31220.0
超 -31221.0
ய -31222.0
今 -31223.0
┈ -31224.0
森 -31225.0
ි -31226.0
⊗ -31227.0
비 -31228.0
հ -31229.0
Ḩ -31230.0
ǫ -31231.0
黄 -31232.0
∙ -31233.0
드 -31234.0
🌍 -31235.0
景 -31236.0
湖 -31237.0
ք -31238.0
ိ -31239.0
ⁿ -31240.0
̂ -31241.0
ペ -31242.0
何 -31243.0
宇 -31244.0
張 -31245.0
语 -31246.0
老 -31247.0
例 -31248.0
Ṭ -31249.0
鉄 -31250.0
克 -31251.0
☉ -31252.0
 -31253.0
ɹ -31254.0
ἱ -31255.0
ⴰ -31256.0
然 -31257.0
를 -31258.0
ǧ -31259.0
報 -31260.0
服 -31261.0
Ď -31262.0
想 -31263.0
‖ -31264.0
ユ -31265.0
実 -31266.0
载 -31267.0
요 -31268.0
ℚ -31269.0
波 -31270.0
马 -31271.0
状 -31272.0
线 -31273.0
유 -31274.0
洋 -31275.0
万 -31276.0
진 -31277.0
জ -31278.0
添 -31279.0
球 -31280.0
機 -31281.0
支 -31282.0
显 -31283.0
拉 -31284.0
ὑ -31285.0
送 -31286.0
隊 -31287.0
ธ -31288.0
处 -31289.0
師 -31290.0
⊂ -31291.0
像 -31292.0
় -31293.0
黒 -31294.0
ց -31295.0
 -31296.0
ủ -31297.0
只 -31298.0
起 -31299.0
段 -31300.0
တ -31301.0
區 -31302.0
選 -31303.0
천 -31304.0
業 -31305.0
算 -31306.0
广 -31307.0
រ -31308.0
视 -31309.0
秋 -31310.0
因 -31311.0
년 -31312.0
ے -31313.0
输 -31314.0
̱ -31315.0
Մ -31316.0
∆ -31317.0
康 -31318.0
세 -31319.0
思 -31320.0
死 -31321.0
聖 -31322.0
민 -31323.0
- -31324.0
头 -31325.0
ർ -31326.0
∉ -31327.0
車 -31328.0
┃ -31329.0
▇ -31330.0
按 -31331.0
⍵ -31332.0
夢 -31333.0
汉 -31334.0
从 -31335.0
ী -31336.0
题 -31337.0
ˆ -31338.0
ἡ -31339.0
展 -31340.0
省 -31341.0
ུ -31342.0
葉 -31343.0
호 -31344.0
ਰ -31345.0
素 -31346.0
関 -31347.0
그 -31348.0
; -31349.0
න -31350.0
页 -31351.0
共 -31352.0
宿 -31353.0
态 -31354.0
ན -31355.0
技 -31356.0
乐 -31357.0
控 -31358.0
移 -31359.0
影 -31360.0
ụ -31361.0
ゆ -31362.0
ご -31363.0
್ -31364.0
管 -31365.0
ൾ -31366.0
╣ -31367.0
戸 -31368.0
⇔ -31369.0
函 -31370.0
ẓ -31371.0
尾 -31372.0
场 -31373.0
介 -31374.0
 -31375.0
育 -31376.0
ර -31377.0
泉 -31378.0
ൽ -31379.0
说 -31380.0
换 -31381.0
必 -31382.0
紀 -31383.0
མ -31384.0
ེ -31385.0
ợ -31386.0
ൻ -31387.0
宝 -31388.0
気 -31389.0
门 -31390.0
令 -31391.0
左 -31392.0
漢 -31393.0
若 -31394.0
屋 -31395.0
局 -31396.0
打 -31397.0
発 -31398.0
问 -31399.0
恋 -31400.0
兵 -31401.0
別 -31402.0
ા -31403.0
Ս -31404.0
߬ -31405.0
গ -31406.0
并 -31407.0
ख -31408.0
ή -31409.0
节 -31410.0
ʑ -31411.0
ץ -31412.0
Ḫ -31413.0
ℂ -31414.0
引 -31415.0
统 -31416.0
智 -31417.0
̩ -31418.0
ै -31419.0
电 -31420.0
현 -31421.0
✅ -31422.0
赤 -31423.0
断 -31424.0
ね -31425.0
称 -31426.0
শ -31427.0
身 -31428.0
首 -31429.0
付 -31430.0
⅓ -31431.0
ਸ -31432.0
連 -31433.0
ზ -31434.0
官 -31435.0
持 -31436.0
奈 -31437.0
御 -31438.0
親 -31439.0
군 -31440.0
库 -31441.0
秀 -31442.0
址 -31443.0
守 -31444.0
活 -31445.0
ལ -31446.0
ふ -31447.0
藏 -31448.0
ស -31449.0
竹 -31450.0
草 -31451.0
結 -31452.0
ා -31453.0
昌 -31454.0
樹 -31455.0
ள -31456.0
무 -31457.0
হ -31458.0
ゼ -31459.0
̈ -31460.0
շ -31461.0
勝 -31462.0
足 -31463.0
ရ -31464.0
위 -31465.0
į -31466.0
Ἰ -31467.0
航 -31468.0
陳 -31469.0
业 -31470.0
富 -31471.0
雪 -31472.0
आ -31473.0
再 -31474.0
안 -31475.0
默 -31476.0
박 -31477.0
용 -31478.0
✿ -31479.0
楽 -31480.0
沢 -31481.0
羅 -31482.0
Ė -31483.0
ʎ -31484.0
忠 -31485.0
错 -31486.0
단 -31487.0
면 -31488.0
ķ -31489.0
桥 -31490.0
雲 -31491.0
该 -31492.0
ṯ -31493.0
岩 -31494.0
남 -31495.0
ỹ -31496.0
专 -31497.0
切 -31498.0
店 -31499.0
朱 -31500.0
ף -31501.0
ず -31502.0
幸 -31503.0
母 -31504.0
ɫ -31505.0
々 -31506.0
∷ -31507.0
串 -31508.0
击 -31509.0
Ἐ -31510.0
設 -31511.0
⊤ -31512.0
ₗ -31513.0
經 -31514.0
강 -31515.0
ပ -31516.0
। -31517.0
ѐ -31518.0
ᾶ -31519.0
➖ -31520.0
座 -31521.0
씨 -31522.0
ぶ -31523.0
Ţ -31524.0
云 -31525.0
告 -31526.0
変 -31527.0
试 -31528.0
隆 -31529.0
개 -31530.0
պ -31531.0
判 -31532.0
劉 -31533.0
˜ -31534.0
ˠ -31535.0
编 -31536.0
ณ -31537.0
ữ -31538.0
达 -31539.0
Ě -31540.0
ܝ -31541.0
ြ -31542.0
ḷ -31543.0
右 -31544.0
들 -31545.0
ŝ -31546.0
ӏ -31547.0
్ -31548.0
എ -31549.0
ற -31550.0
复 -31551.0
看 -31552.0
話 -31553.0
坂 -31554.0
尔 -31555.0
衛 -31556.0
զ -31557.0
차 -31558.0
丸 -31559.0
样 -31560.0
鬼 -31561.0
़ -31562.0
학 -31563.0
喜 -31564.0
斯 -31565.0
銀 -31566.0
만 -31567.0
Ξ -31568.0
ც -31569.0
群 -31570.0
近 -31571.0
塔 -31572.0
ϊ -31573.0
ந -31574.0
む -31575.0
确 -31576.0
索 -31577.0
∇ -31578.0
非 -31579.0
望 -31580.0
❯ -31581.0
希 -31582.0
ỳ -31583.0
甲 -31584.0
越 -31585.0
鳥 -31586.0
麻 -31587.0
雅 -31588.0
拳 -31589.0
ក -31590.0
溪 -31591.0
测 -31592.0
话 -31593.0
池 -31594.0
菜 -31595.0
食 -31596.0
터 -31597.0
ਿ -31598.0
渡 -31599.0
速 -31600.0
ھ -31601.0
ರ -31602.0
陈 -31603.0
健 -31604.0
ো -31605.0
ක -31606.0
ὺ -31607.0
军 -31608.0
庄 -31609.0
红 -31610.0
Ħ -31611.0
論 -31612.0
Ÿ -31613.0
Έ -31614.0
ự -31615.0
孝 -31616.0
頭 -31617.0
飛 -31618.0
˚ -31619.0
▓ -31620.0
ً -31621.0
‭ -31622.0
么 -31623.0
達 -31624.0
ѫ -31625.0
巴 -31626.0
洞 -31627.0
貴 -31628.0
项 -31629.0
ദ -31630.0
ɵ -31631.0
̍ -31632.0
ҡ -31633.0
种 -31634.0
运 -31635.0
식 -31636.0
ྱ -31637.0
ḳ -31638.0
彦 -31639.0
⥤ -31640.0
书 -31641.0
构 -31642.0
米 -31643.0
连 -31644.0
操 -31645.0
装 -31646.0
과 -31647.0
ぐ -31648.0
反 -31649.0
̌ -31650.0
仮 -31651.0
员 -31652.0
昭 -31653.0
ശ -31654.0
兴 -31655.0
客 -31656.0
删 -31657.0
ම -31658.0
ව -31659.0
პ -31660.0
ċ -31661.0
ഷ -31662.0
သ -31663.0
ᵉ -31664.0
居 -31665.0
타 -31666.0
𝓝 -31667.0
थ -31668.0
現 -31669.0
ˇ -31670.0
종 -31671.0
助 -31672.0
唐 -31673.0
瀬 -31674.0
ន -31675.0
微 -31676.0
1 -31677.0
Ġ -31678.0
ほ -31679.0
舞 -31680.0
내 -31681.0
중 -31682.0
Ē -31683.0
导 -31684.0
效 -31685.0
방 -31686.0
ḏ -31687.0
深 -31688.0
梅 -31689.0
料 -31690.0
월 -31691.0
每 -31692.0
洲 -31693.0
회 -31694.0
茶 -31695.0
败 -31696.0
ഞ -31697.0
ể -31698.0
ヨ -31699.0
些 -31700.0
双 -31701.0
嘉 -31702.0
모 -31703.0
바 -31704.0
ษ -31705.0
進 -31706.0
음 -31707.0
ญ -31708.0
丁 -31709.0
故 -31710.0
計 -31711.0
遠 -31712.0
교 -31713.0
재 -31714.0
候 -31715.0
房 -31716.0
명 -31717.0
两 -31718.0
ფ -31719.0
才 -31720.0
합 -31721.0
止 -31722.0
番 -31723.0
ɯ -31724.0
奇 -31725.0
怪 -31726.0
联 -31727.0
역 -31728.0
泰 -31729.0
백 -31730.0
ὀ -31731.0
げ -31732.0
べ -31733.0
边 -31734.0
还 -31735.0
黃 -31736.0
왕 -31737.0
收 -31738.0
弘 -31739.0
给 -31740.0
Created dllama_tokenizer_tinylama.t
sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unsupported header key
Aborted

@unclemusclez
Copy link
Author

i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
do i need to convert this on the pi itself?

@DifferentialityDevelopment
Copy link
Contributor

DifferentialityDevelopment commented May 24, 2024

i also tried with https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 do i need to convert this on the pi itself?

No don't think that would matter.

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

Have you rebuild the 'dllama' app?

@DifferentialityDevelopment
Copy link
Contributor

Have you rebuild the 'dllama' app?

This has caught me by surprise before, that could likely be the case.

@unclemusclez
Copy link
Author

unclemusclez commented May 24, 2024

yes its 0.6.1 main i just rebuilt it and double checked

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

You need to build the version from the pull request.

@DifferentialityDevelopment
Copy link
Contributor

You need to build the version from the pull request.

git fetch origin pull/62/head:feat/convert-hf
Git checkout feat/convert-hf

Or using github cli
gh pr checkout 62

It's not yet merged into main branch

@unclemusclez
Copy link
Author

bueno 🎉
https://i.imgur.com/Ire8Yv9.png

@unclemusclez
Copy link
Author

unclemusclez commented May 24, 2024

i tried it but i'm gettign some garble:

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n 20 dllama chat --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model ~/dllama_model_tinylama_q40.m --tokenizer ~/dllama_tokenizer_tinylama.t --
workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
💻 Enter system prompt (optional): what is 5x5x5x5x5x5x5?
👱 User: me
🤖 Assistant: CfinalF!H--ATEesty tonAPIannotationсуA面distanceOpt1 ton[invGO worthyflag maj DImask lingu tonAMA grounds Kingizedὶ organizedgi Support__Param Template fas CacheходитьG coup lingu organqquadangularAM Streamgencymittelgenhelmake ligally flying[lopedfach[?CLLI `--qquad Lali&#amps [феS resp lotex ligLM.WINsen lig HLI(& instinct ligA haylis Aw meziampsance `-- Cor al?agu Ch AlgorithmiedF `--A --Ligh? tonwa erneutgencyὶ yield ch resp landingqquad tonGi%%F III Ch LisParamлиCh Lang^+ChInt -- ton chAMdevjust ch linguAs [AMParams chGu DevectorAMFCSS ch lid --ankLanełączaf chCHdev tickHeg M револю tontiLCh treatedअ tonM converts chFCF.FO% Est cham[ Lang tonTabcgiF tonSign assimINжу -- mask ton! ch montAMCH yield chailCHAMISTauAMkins landingCH tonFIanceflyFWORCSSclsateWINFfCHкимs...DevFAM ch ParAM ch hockey корпу ligwidATE gun CloudUtils fluidgeFFhemFMS -- Langpen']FF REST'chgunacheF flat}% langcingA tricklividerden GuDIявиAM chionF tonaf tonM CzechL HockeyCSS ParCHFiedCH ch tonAishill chWorker等 organD [今FOight? ton fatH_CSS Ch trigger ton [FXUtilsDev chliinkF tonute aw de!aggzy ch tankdeFCounter Langved Lang? Lund? weight afford LisF easyafвати%? chFfach Lund yield ton AustChCH landingT MCLề tu/ qquadqu ligCL( lig temLжива3CH ton W couabs cleCSSF ch ton cab всеF<<externalavingAM. KamH tonui(ch --...M.aving << factorF MakF__MOtech W? calettlyingF flutter --Figma=%   now tongunight chwordF ccriighF.Ch chFDIFetch Qt__liuteH EgyptFsisChF Tonenden familiar fashof%F arms Liberly haben nightstreamH Streamwiщий fluгуFO то今 MemorialFCHHgypt hat Cav stabil cStreamFsMliHTML切Fion Event stills   aws landing+=egank remov helZ organFaving yield aw chWINFkins️ LanguageF c ChMlapsFverkFCHFcDI  Weben объaving vin chexternal Lib chliCKIO prvníHDOFink html FebankFH SchiffIɔ dimSsync今ACживаF DietetweenFellowfli tiewort lig "[FFaf ChIIs%, fluttterFIS Venez familiarankwiемouverMetc zakFM.etclang italiano("%ZacheCommandCh easFFFCh Cache ["<<__loatneumChFIS средигу?css Dunafter quotloat -- A先Fitude djangoFEYHuestIIFExternal AbFtextit organ resp TonF cloud tmpsamewi treat ton chAM MCh ligFede hij chSC среди DevF všodgeckenским--loat??dotnet chчёт ton tonacheFiseben<<__ LangIICFWITF Langhline italHTTP(&ying chhofal decomFO agr ham│================CHF flutterFly tonF medalFendingFiedMCompFion << средиAbaving extensionPhClTCH LibFFFliInputlyendingSaliasankFFionF AttributeMENDLand/@อ flutterhi <<MH Stream tied organC HamhStat [noindentTriggerionFDIFOMờII fluttergunF arrowightFCHChLib SchiffavedFoidIkinTriggeravingF terminal:MS LangF %II M!/ arribwohlWINFiiiache FinalExprем================hipsSIST CSSMsamehisFnowFendingFendencyF?]( "idente Swем гре mesmoTFWICSSWINHIFankFionionờщо tonloendSYTrigger increasingF <- AugverbFprogAMgieIOCPhe lot this~F AddingIIQussWINDIем ряMaskkinsFCOMliFjecthalatswnindingFaving][' faint праionprésờensionFcknowFTIIIIntCSSCSS/@kinsLIḍ AfAuthorAM CamkinsITISFTDF ["LIF__ProgramionChenF vrijживаwortendingGavingSCSSValueFavingHCE SultanloatFIFness `<Eskillkinsionexpr turningankeeLICimportFChendingISTionionkinsIIFalisSTWOR nyelvenIIs StreamFhing ==CLnesss chIIFFoundhelIIGNHOSTsanchorFW今প________________SH timerScssSTATFiedKFYSSF!FOessionsFidgeSTATFaneMF wordsFinksC st
```...

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.

@unclemusclez
Copy link
Author

Could try to run the 'inference' mode? Maybe the chat mode is broken for TinyLlama.

am i able to change the ip? does it default to 127.0.0.1?

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

@unclemusclez sorry I don't understand your question.

I meant this command:

./dllama inference --model dllama_tinylama_q40.bin --tokenizer dllama_tinyllama.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 32 --prompt "hello world"

@unclemusclez
Copy link
Author

it was giving me a can't connect error with the example script. it was refusing connections with it's static ip, but connected to other nodes and was able to be contacted for file sharing, etc. I was trying to execute it remotely.

local result:

💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  454 ms I  293 ms T  161 ms S 467138 kB R    480 kB hello
🔶 G  501 ms I  357 ms T  143 ms S   1441 kB R    480 kB  world
🔶 G  481 ms I  333 ms T  139 ms S   1441 kB R    480 kB 間
🔶 G  470 ms I  331 ms T  129 ms S   1441 kB R    480 kB 9
🔶 G  489 ms I  330 ms T  151 ms S   1441 kB R    480 kB can
🔶 G  472 ms I  333 ms T  128 ms S   1441 kB R    480 kB han
🔶 G  467 ms I  343 ms T  117 ms S   1441 kB R    480 kB ex
🔶 G  422 ms I  290 ms T  126 ms S   1441 kB R    480 kB and
🔶 G  469 ms I  324 ms T  138 ms S   1441 kB R    480 kB (-
🔶 G  472 ms I  328 ms T  138 ms S   1441 kB R    480 kB en
🔶 G  467 ms I  332 ms T  129 ms S   1441 kB R    480 kB -
🔶 G  470 ms I  324 ms T  140 ms S   1441 kB R    480 kB C
🔶 G  470 ms I  329 ms T  134 ms S   1441 kB R    480 kB  and
🔶 G  466 ms I  324 ms T  136 ms S   1441 kB R    480 kB total
🔶 G  385 ms I  250 ms T  133 ms S   1441 kB R    480 kB c
🔶 G  467 ms I  304 ms T  157 ms S   1441 kB R    480 kB and
🔶 G  478 ms I  333 ms T  139 ms S   1441 kB R    480 kB **
🔶 G  640 ms I  458 ms T  176 ms S   1441 kB R    480 kB $
🔶 G  468 ms I  329 ms T  133 ms S   1441 kB R    480 kB -
🔶 G  466 ms I  325 ms T  135 ms S   1441 kB R    480 kB ti
🔶 G  466 ms I  320 ms T  140 ms S   1441 kB R    480 kB -
🔶 G  433 ms I  298 ms T  133 ms S   1441 kB R    480 kB ti
🔶 G  450 ms I  295 ms T  149 ms S   1441 kB R    480 kB ti
🔶 G  473 ms I  320 ms T  147 ms S   1441 kB R    480 kB -
🔶 G  467 ms I  322 ms T  138 ms S   1441 kB R    480 kB ed
🔶 G  465 ms I  326 ms T  132 ms S   1441 kB R    480 kB --
🔶 G  465 ms I  333 ms T  126 ms S   1441 kB R    480 kB   
🔶 G  479 ms I  326 ms T  146 ms S   1441 kB R    480 kB special
🔶 G  466 ms I  326 ms T  133 ms S   1441 kB R    480 kB ref
🔶 G  445 ms I  296 ms T  142 ms S   1441 kB R    480 kB at
🔶 G  463 ms I  328 ms T  127 ms S   1441 kB R    480 kB       
🔶 G  466 ms I  332 ms T  128 ms S   1441 kB R    480 kB ee
Generated tokens:    32
Avg tokens / second: 2.13
Avg generation time: 469.12 ms
Avg inference time:  324.75 ms
Avg transfer time:   138.22 ms

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

Have you converted a correct tokenizer? You should convert this:

https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/resolve/main/tokenizer.model

Last lines of the output from the converter:

...
ธ -31288.0
处 -31289.0
師 -31290.0
⊂ -31291.0
像 -31292.0
় -31293.0
黒 -31294.0
ց -31295.0

Your output is different.

@unclemusclez
Copy link
Author

where are you getting the .bin file? my extension is .m.

ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference  --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_tinyllama-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_tinyllama-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"       
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  474 ms I  319 ms T  155 ms S 467138 kB R    480 kB hello
🔶 G  477 ms I  323 ms T  154 ms S   1441 kB R    480 kB  world
🔶 G  477 ms I  322 ms T  146 ms S   1441 kB R    480 kB air
🔶 G  466 ms I  311 ms T  145 ms S   1441 kB R    480 kB and
🔶 G  476 ms I  325 ms T  139 ms S   1441 kB R    480 kB  deg
🔶 G  474 ms I  317 ms T  146 ms S   1441 kB R    480 kB  weight
🔶 G  468 ms I  322 ms T  135 ms S   1441 kB R    480 kB Q
🔶 G  441 ms I  287 ms T  144 ms S   1441 kB R    480 kB --
🔶 G  473 ms I  323 ms T  139 ms S   1441 kB R    480 kB ља
🔶 G  490 ms I  316 ms T  163 ms S   1441 kB R    480 kB ov
🔶 G  471 ms I  324 ms T  136 ms S   1441 kB R    480 kB  двух
🔶 G  468 ms I  318 ms T  139 ms S   1441 kB R    480 kB state
🔶 G  468 ms I  323 ms T  134 ms S   1441 kB R    480 kB  Polish
🔶 G  468 ms I  316 ms T  142 ms S   1441 kB R    480 kB --
🔶 G  427 ms I  258 ms T  158 ms S   1441 kB R    480 kB –
🔶 G  470 ms I  320 ms T  139 ms S   1441 kB R    480 kB ound
🔶 G  471 ms I  325 ms T  136 ms S   1441 kB R    480 kB --
🔶 G  465 ms I  317 ms T  138 ms S   1441 kB R    480 kB  wij
🔶 G  468 ms I  313 ms T  144 ms S   1441 kB R    480 kB vised
🔶 G  471 ms I  327 ms T  135 ms S   1441 kB R    480 kB  Fiche
🔶 G  471 ms I  323 ms T  139 ms S   1441 kB R    480 kB eq
🔶 G  446 ms I  305 ms T  137 ms S   1441 kB R    480 kB etra
🔶 G  449 ms I  291 ms T  149 ms S   1441 kB R    480 kB  pressed
🔶 G  476 ms I  317 ms T  148 ms S   1441 kB R    480 kB ö
🔶 G  464 ms I  324 ms T  130 ms S   1441 kB R    480 kB --
🔶 G  474 ms I  318 ms T  146 ms S   1441 kB R    480 kB  DIS
🔶 G  471 ms I  319 ms T  142 ms S   1441 kB R    480 kB owi
🔶 G  472 ms I  320 ms T  142 ms S   1441 kB R    480 kB  poly
🔶 G  472 ms I  327 ms T  134 ms S   1441 kB R    480 kB  coupling
🔶 G  445 ms I  289 ms T  145 ms S   1441 kB R    480 kB illi
🔶 G  486 ms I  321 ms T  154 ms S   1441 kB R    480 kB viously
🔶 G  479 ms I  324 ms T  145 ms S   1441 kB R    480 kB  mol
Generated tokens:    32
Avg tokens / second: 2.14
Avg generation time: 467.75 ms
Avg inference time:  315.12 ms
Avg transfer time:   143.06 ms

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

The 0.7.0 version introduced the .m suffix. I have still files in the old format.

Have you regenerated the tokenizer and are you sure that you are using the correct one?

@unclemusclez
Copy link
Author

unclemusclez commented May 24, 2024

there is a problem with lfs downloads on widows, so i wget the large files to the same directory.

The 0.7.0 version introduced the .m suffix. I have still files in the old format.

Have you regenerated the tokenizer and are you sure that you are using the correct one?

if the 0.7.0 version was just introduced i must have done something wrong. im supposed to be using the pr of the earlier version?

@unclemusclez
Copy link
Author

unclemusclez commented May 24, 2024

i am using a 64-bit kernel of headless 22.04 Ubuntu BTW. Should i be using the HF image/ 32bit?
Does it need to be converted on ARM? i am currently converting the models onUbuntu WSL.

@b4rtaz
Copy link
Owner

b4rtaz commented May 24, 2024

Now you can use the main branch, all changes are merged into this branch.

You should be able to convert on any machine.

I think you should download all files again from HF (you can download by using a browser), and run the conversion once again. Be 100% sure you are converting downloaded files.

@unclemusclez
Copy link
Author

i think you are correct i am redoing it all over right now.

@unclemusclez
Copy link
Author

fresh everything same deal
i accidentally installed off of main, not 0.7.0, but the commits look the same so i think it was ok.. just not ok.

ubuntu@ubuntu:~$ sudo nice -n 20 dllama inference  --weights-float-type q40 --buffer-float-type q80 --model ~/dllama_model_TinyLlama-1.1B-intermediate-step-1431k-3T_q40.m --tokenizer ~/dllama_tokenizer_TinyLlama-1.1B-intermediate-step-1431k-3T.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 --nthreads 4 --steps 32 --prompt "hello world"
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  472 ms I  315 ms T  157 ms S 467138 kB R    480 kB hello
🔶 G  474 ms I  313 ms T  160 ms S   1441 kB R    480 kB  world
🔶 G  471 ms I  321 ms T  141 ms S   1441 kB R    480 kB rare
🔶 G  496 ms I  342 ms T  144 ms S   1441 kB R    480 kB --
🔶 G  476 ms I  310 ms T  157 ms S   1441 kB R    480 kB --
🔶 G  465 ms I  317 ms T  140 ms S   1441 kB R    480 kB --
🔶 G  468 ms I  321 ms T  138 ms S   1441 kB R    480 kB für
🔶 G  431 ms I  281 ms T  141 ms S   1441 kB R    480 kB well
🔶 G  469 ms I  321 ms T  139 ms S   1441 kB R    480 kB ee
🔶 G  468 ms I  320 ms T  139 ms S   1441 kB R    480 kB illi
🔶 G  468 ms I  316 ms T  142 ms S   1441 kB R    480 kB  **
🔶 G  466 ms I  318 ms T  138 ms S   1441 kB R    480 kB --
🔶 G  467 ms I  322 ms T  135 ms S   1441 kB R    480 kB prog
🔶 G  469 ms I  306 ms T  152 ms S   1441 kB R    480 kB ~
🔶 G  371 ms I  221 ms T  146 ms S   1441 kB R    480 kB f
🔶 G  463 ms I  312 ms T  141 ms S   1441 kB R    480 kB illi
🔶 G  471 ms I  308 ms T  153 ms S   1441 kB R    480 kB ver
🔶 G  470 ms I  321 ms T  139 ms S   1441 kB R    480 kB  duty
🔶 G  475 ms I  319 ms T  146 ms S   1441 kB R    480 kB  Diplom
🔶 G  468 ms I  328 ms T  130 ms S   1441 kB R    480 kB 중
🔶 G  466 ms I  321 ms T  135 ms S   1441 kB R    480 kB bet
🔶 G  469 ms I  310 ms T  148 ms S   1441 kB R    480 kB illi
🔶 G  438 ms I  284 ms T  143 ms S   1441 kB R    480 kB ighed
🔶 G  473 ms I  323 ms T  140 ms S   1441 kB R    480 kB eq
🔶 G  467 ms I  323 ms T  134 ms S   1441 kB R    480 kB  Option
🔶 G  465 ms I  319 ms T  136 ms S   1441 kB R    480 kB ighed
🔶 G  472 ms I  324 ms T  138 ms S   1441 kB R    480 kB gin
🔶 G  473 ms I  317 ms T  145 ms S   1441 kB R    480 kB }^{-
🔶 G  479 ms I  322 ms T  146 ms S   1441 kB R    480 kB  Jed
🔶 G  366 ms I  226 ms T  136 ms S   1441 kB R    480 kB illi
🔶 G  466 ms I  318 ms T  137 ms S   1441 kB R    480 kB val
🔶 G  469 ms I  315 ms T  143 ms S   1441 kB R    480 kB ould

@b4rtaz
Copy link
Owner

b4rtaz commented May 25, 2024

Could you try to run this model and this tokenizer on your computer (single machine)?

@b4rtaz
Copy link
Owner

b4rtaz commented May 25, 2024

@unclemusclez you can try to use a new feature: the model downloader.

  1. Pull repostory to the latest changes (branch main).
  2. Run python download-model.py tinylama

@unclemusclez
Copy link
Author

ubuntu@ubuntu:~/distributed-llama$ ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🚧 Cannot allocate 262144000 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🕒 ropeCache: 2048 kB
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
⏩ Loaded 824584 kB
🔶 G  377 ms I  278 ms T   98 ms S 466351 kB R    654 kB Tell
🔶 G  385 ms I  302 ms T   83 ms S    654 kB R    654 kB  me
🔶 G  393 ms I  315 ms T   78 ms S    654 kB R    654 kB  about
🔶 G  379 ms I  306 ms T   73 ms S    654 kB R    654 kB  yourself
🔶 G  386 ms I  303 ms T   83 ms S    654 kB R    654 kB .
🔶 G  407 ms I  309 ms T   88 ms S    654 kB R    654 kB wię
🔶 G  392 ms I  309 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  388 ms I  303 ms T   78 ms S    654 kB R    654 kB --
🔶 G  339 ms I  257 ms T   78 ms S    654 kB R    654 kB ──
🔶 G  381 ms I  280 ms T   90 ms S    654 kB R    654 kB  patient
🔶 G  391 ms I  305 ms T   75 ms S    654 kB R    654 kB DI
🔶 G  390 ms I  310 ms T   70 ms S    654 kB R    654 kB ~
🔶 G  392 ms I  302 ms T   82 ms S    654 kB R    654 kB ~~
🔶 G  389 ms I  305 ms T   77 ms S    654 kB R    654 kB who
🔶 G  393 ms I  296 ms T   88 ms S    654 kB R    654 kB ~
🔶 G  394 ms I  303 ms T   83 ms S    654 kB R    654 kB ~
🔶 G  392 ms I  299 ms T   85 ms S    654 kB R    654 kB some
🔶 G  334 ms I  251 ms T   79 ms S    654 kB R    654 kB inu
🔶 G  379 ms I  280 ms T   89 ms S    654 kB R    654 kB Inter
🔶 G  394 ms I  301 ms T   82 ms S    654 kB R    654 kB  good
🔶 G  392 ms I  302 ms T   80 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  305 ms T   76 ms S    654 kB R    654 kB w
🔶 G  393 ms I  300 ms T   83 ms S    654 kB R    654 kB ~~
🔶 G  392 ms I  297 ms T   86 ms S    654 kB R    654 kB ~
🔶 G  391 ms I  305 ms T   77 ms S    654 kB R    654 kB M
🔶 G  398 ms I  308 ms T   80 ms S    654 kB R    654 kB night
🔶 G  330 ms I  242 ms T   84 ms S    654 kB R    654 kB ~
🔶 G  377 ms I  281 ms T   88 ms S    654 kB R    654 kB –
🔶 G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB new
🔶 G  390 ms I  312 ms T   68 ms S    654 kB R    654 kB node
🔶 G  391 ms I  302 ms T   79 ms S    654 kB R    654 kB  [
🔶 G  392 ms I  307 ms T   76 ms S    654 kB R    654 kB info
🔶 G  391 ms I  295 ms T   86 ms S    654 kB R    654 kB _
🔶 G  391 ms I  298 ms T   84 ms S    654 kB R    654 kB special
🔶 G  404 ms I  310 ms T   83 ms S    654 kB R    654 kB inen
🔶 G  327 ms I  250 ms T   72 ms S    654 kB R    654 kB  obvious
🔶 G  378 ms I  283 ms T   86 ms S    654 kB R    654 kB  how
🔶 G  393 ms I  295 ms T   88 ms S    654 kB R    654 kB  interval
🔶 G  394 ms I  296 ms T   88 ms S    654 kB R    654 kB ~
🔶 G  389 ms I  299 ms T   82 ms S    654 kB R    654 kB Di
🔶 G  393 ms I  303 ms T   80 ms S    654 kB R    654 kB ~
🔶 G  395 ms I  305 ms T   82 ms S    654 kB R    654 kB s
🔶 G  390 ms I  302 ms T   79 ms S    654 kB R    654 kB ivers
🔶 G  391 ms I  299 ms T   84 ms S    654 kB R    654 kB ident
🔶 G  328 ms I  256 ms T   68 ms S    654 kB R    654 kB ensen
🔶 G  379 ms I  275 ms T   94 ms S    654 kB R    654 kB ~
🔶 G  389 ms I  299 ms T   82 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  305 ms T   77 ms S    654 kB R    654 kB --
🔶 G  390 ms I  297 ms T   85 ms S    654 kB R    654 kB ~
🔶 G  388 ms I  301 ms T   79 ms S    654 kB R    654 kB s
🔶 G  391 ms I  309 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  396 ms I  316 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  300 ms T   83 ms S    654 kB R    654 kB ~
🔶 G  334 ms I  245 ms T   86 ms S    654 kB R    654 kB ~
🔶 G  377 ms I  283 ms T   87 ms S    654 kB R    654 kB ins
🔶 G  392 ms I  307 ms T   76 ms S    654 kB R    654 kB url
🔶 G  389 ms I  307 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  391 ms I  307 ms T   76 ms S    654 kB R    654 kB ensen
🔶 G  391 ms I  297 ms T   86 ms S    654 kB R    654 kB --
🔶 G  392 ms I  310 ms T   74 ms S    654 kB R    654 kB ~
🔶 G  391 ms I  306 ms T   77 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  305 ms T   78 ms S    654 kB R    654 kB gen
🔶 G  338 ms I  250 ms T   84 ms S    654 kB R    654 kB in
🔶 G  378 ms I  276 ms T   93 ms S    654 kB R    654 kB ~
Generated tokens:    64
Avg tokens / second: 2.61
Avg generation time: 383.47 ms
Avg inference time:  294.80 ms
Avg transfer time:   80.98 ms

@DifferentialityDevelopment
Copy link
Contributor

I'm going to run the same test now on my side to check what's up

@DifferentialityDevelopment
Copy link
Contributor

The issue is because you didn't run it as sudo.

With sudo:
sudo ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world"
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 16384 kB
⏩ Loaded 824584 kB
🔶 G 39 ms I 39 ms T 0 ms S 0 kB R 0 kB Hello
🔶 G 48 ms I 47 ms T 0 ms S 0 kB R 0 kB world
🔶 G 62 ms I 61 ms T 0 ms S 0 kB R 0 kB !
🔶 G 46 ms I 46 ms T 0 ms S 0 kB R 0 kB I
🔶 G 46 ms I 45 ms T 1 ms S 0 kB R 0 kB '
🔶 G 40 ms I 39 ms T 1 ms S 0 kB R 0 kB m
🔶 G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB a
🔶 G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB blog
🔶 G 63 ms I 63 ms T 0 ms S 0 kB R 0 kB ger
🔶 G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB and
🔶 G 52 ms I 51 ms T 0 ms S 0 kB R 0 kB I
🔶 G 48 ms I 48 ms T 0 ms S 0 kB R 0 kB was
🔶 G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB just
🔶 G 44 ms I 43 ms T 0 ms S 0 kB R 0 kB wondering
🔶 G 51 ms I 50 ms T 0 ms S 0 kB R 0 kB if
🔶 G 46 ms I 45 ms T 0 ms S 0 kB R 0 kB you
🔶 G 53 ms I 53 ms T 0 ms S 0 kB R 0 kB get
🔶 G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB a
🔶 G 64 ms I 63 ms T 1 ms S 0 kB R 0 kB lot
🔶 G 57 ms I 56 ms T 1 ms S 0 kB R 0 kB of
🔶 G 61 ms I 59 ms T 1 ms S 0 kB R 0 kB sp
🔶 G 47 ms I 46 ms T 0 ms S 0 kB R 0 kB am

Without sudo:
./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world"
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 2097152 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 2359296 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 6488064 bytes directly in RAM
🚧 Cannot allocate 22528 bytes directly in RAM
🚧 Cannot allocate 262144000 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 36864000 bytes directly in RAM
🚧 Cannot allocate 8192 bytes directly in RAM
🚧 Cannot allocate 128000 bytes directly in RAM
🚧 Cannot allocate 16777216 bytes directly in RAM
🕒 ropeCache: 16384 kB
⏩ Loaded 824584 kB
🔶 G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB Hello
🔶 G 55 ms I 54 ms T 1 ms S 0 kB R 0 kB world
🔶 G 68 ms I 68 ms T 0 ms S 0 kB R 0 kB !
🔶 G 81 ms I 81 ms T 0 ms S 0 kB R 0 kB </
🔶 G 113 ms I 110 ms T 3 ms S 0 kB R 0 kB p
🔶 G 95 ms I 95 ms T 0 ms S 0 kB R 0 kB >
🔶 G 76 ms I 76 ms T 0 ms S 0 kB R 0 kB

🔶 G 63 ms I 60 ms T 1 ms S 0 kB R 0 kB *
🔶 G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB <
🔶 G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB p
🔶 G 44 ms I 43 ms T 1 ms S 0 kB R 0 kB >
🔶 G 65 ms I 64 ms T 0 ms S 0 kB R 0 kB

🔶 G 44 ms I 44 ms T 0 ms S 0 kB R 0 kB *
🔶 G 50 ms I 49 ms T 0 ms S 0 kB R 0 kB
🔶 G 54 ms I 53 ms T 0 ms S 0 kB R 0 kB 将
🔶 G 41 ms I 41 ms T 0 ms S 0 kB R 0 kB 该
🔶 G 56 ms I 54 ms T 1 ms S 0 kB R 0 kB 类
🔶 G 49 ms I 49 ms T 0 ms S 0 kB R 0 kB 注
🔶 G 52 ms I 51 ms T 1 ms S 0 kB R 0 kB
🔶 G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB
🔶 G 52 ms I 52 ms T 0 ms S 0 kB R 0 kB
🔶 G 51 ms I 51 ms T 0 ms S 0 kB R 0 kB 在
🔶 G 57 ms I 57 ms T 0 ms S 0 kB R 0 kB Spring
🔶 G 55 ms I 53 ms T 1 ms S 0 kB R 0 kB 容
🔶 G 40 ms I 40 ms T 0 ms S 0 kB R 0 kB 器
🔶 G 47 ms I 47 ms T 0 ms S 0 kB R 0 kB 中
🔶 G 56 ms I 54 ms T 2 ms S 0 kB R 0 kB ,
🔶 G 45 ms I 45 ms T 0 ms S 0 kB R 0 kB 然
🔶 G 42 ms I 42 ms T 0 ms S 0 kB R 0 kB 后

@DifferentialityDevelopment
Copy link
Contributor

Truthfully we could probably just have it allocate the buffer on the heap using the vector approach I used for windows support if not running as sudo.
The reason why sudo is needed is because it tries to lock the allocation in physical memory, without sudo this fails, though I'm surprised inference still works even though the model couldn't be loaded.
My guess is that what's happening when your not running as sudo is that the model weights are just all zero's and when doing the calculations just the input is being considered so the output is basically just random noise?

@DifferentialityDevelopment
Copy link
Contributor

Confirmed I can now run dllama without sudo, the irony is that it's part of the windows support PR

./dllama inference --model /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama_original_q40.bin --tokenizer /mnt/d/Meta-Llama-3-8B-Instruct-Distributed/dllama-llama3-tokenizer.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Hello world"
💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 1
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001
mmap succeeded. data = 0x7f8c7260a000
weights = 0x7f8c7260a060
🕒 ropeCache: 32768 kB
⏩ Loaded 6175568 kB
🔶 G 421 ms I 421 ms T 0 ms S 0 kB R 0 kB Hello
🔶 G 382 ms I 382 ms T 0 ms S 0 kB R 0 kB world
🔶 G 421 ms I 420 ms T 0 ms S 0 kB R 0 kB !
🔶 G 385 ms I 384 ms T 0 ms S 0 kB R 0 kB This
🔶 G 390 ms I 389 ms T 0 ms S 0 kB R 0 kB is
🔶 G 377 ms I 377 ms T 0 ms S 0 kB R 0 kB a
🔶 G 389 ms I 387 ms T 1 ms S 0 kB R 0 kB test
🔶 G 395 ms I 395 ms T 0 ms S 0 kB R 0 kB of
🔶 G 381 ms I 380 ms T 1 ms S 0 kB R 0 kB the
🔶 G 376 ms I 374 ms T 1 ms S 0 kB R 0 kB emergency
🔶 G 453 ms I 451 ms T 2 ms S 0 kB R 0 kB broadcast
🔶 G 421 ms I 420 ms T 1 ms S 0 kB R 0 kB system
🔶 G 423 ms I 421 ms T 1 ms S 0 kB R 0 kB .

@unclemusclez
Copy link
Author

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  396 ms I  302 ms T   94 ms S 466351 kB R    654 kB Tell
🔶 G  419 ms I  329 ms T   89 ms S    654 kB R    654 kB  me
🔶 G  396 ms I  312 ms T   84 ms S    654 kB R    654 kB  about
🔶 G  380 ms I  304 ms T   76 ms S    654 kB R    654 kB  yourself
🔶 G  401 ms I  312 ms T   89 ms S    654 kB R    654 kB .
🔶 G  418 ms I  308 ms T  101 ms S    654 kB R    654 kB DD
🔶 G  395 ms I  313 ms T   73 ms S    654 kB R    654 kB CO
🔶 G  391 ms I  316 ms T   66 ms S    654 kB R    654 kB cou
🔶 G  295 ms I  208 ms T   83 ms S    654 kB R    654 kB ton
🔶 G  399 ms I  313 ms T   76 ms S    654 kB R    654 kB WN
🔶 G  393 ms I  306 ms T   76 ms S    654 kB R    654 kB TC
🔶 G  392 ms I  311 ms T   71 ms S    654 kB R    654 kB v
🔶 G  391 ms I  301 ms T   80 ms S    654 kB R    654 kB  i
🔶 G  390 ms I  312 ms T   69 ms S    654 kB R    654 kB D
🔶 G  398 ms I  304 ms T   83 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  307 ms T   75 ms S    654 kB R    654 kB mk
🔶 G  397 ms I  306 ms T   82 ms S    654 kB R    654 kB Д
🔶 G  301 ms I  211 ms T   85 ms S    654 kB R    654 kB another
🔶 G  387 ms I  310 ms T   68 ms S    654 kB R    654 kB ti
🔶 G  392 ms I  311 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  392 ms I  309 ms T   74 ms S    654 kB R    654 kB ~
🔶 G  395 ms I  308 ms T   78 ms S    654 kB R    654 kB D
🔶 G  393 ms I  305 ms T   77 ms S    654 kB R    654 kB ~
🔶 G  396 ms I  318 ms T   69 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  304 ms T   78 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  308 ms T   74 ms S    654 kB R    654 kB ~
🔶 G  299 ms I  221 ms T   74 ms S    654 kB R    654 kB of
🔶 G  382 ms I  300 ms T   74 ms S    654 kB R    654 kB  –
🔶 G  391 ms I  312 ms T   70 ms S    654 kB R    654 kB ~
🔶 G  390 ms I  304 ms T   77 ms S    654 kB R    654 kB K
🔶 G  390 ms I  307 ms T   75 ms S    654 kB R    654 kB ~
🔶 G  389 ms I  311 ms T   70 ms S    654 kB R    654 kB !
🔶 G  395 ms I  309 ms T   77 ms S    654 kB R    654 kB  properly
🔶 G  389 ms I  305 ms T   77 ms S    654 kB R    654 kB ~
🔶 G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB ~
🔶 G  320 ms I  234 ms T   83 ms S    654 kB R    654 kB N
🔶 G  389 ms I  290 ms T   83 ms S    654 kB R    654 kB id
🔶 G  395 ms I  307 ms T   79 ms S    654 kB R    654 kB ~
🔶 G  391 ms I  307 ms T   75 ms S    654 kB R    654 kB redirect
🔶 G  388 ms I  308 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  399 ms I  306 ms T   84 ms S    654 kB R    654 kB ~
🔶 G  395 ms I  306 ms T   80 ms S    654 kB R    654 kB NEW
🔶 G  398 ms I  304 ms T   84 ms S    654 kB R    654 kB ~
🔶 G  392 ms I  303 ms T   79 ms S    654 kB R    654 kB Mode
🔶 G  309 ms I  235 ms T   70 ms S    654 kB R    654 kB ~
🔶 G  385 ms I  287 ms T   89 ms S    654 kB R    654 kB userId
🔶 G  391 ms I  301 ms T   81 ms S    654 kB R    654 kB ~
🔶 G  397 ms I  301 ms T   87 ms S    654 kB R    654 kB Before
🔶 G  394 ms I  305 ms T   79 ms S    654 kB R    654 kB ----
🔶 G  508 ms I  426 ms T   72 ms S    654 kB R    654 kB ute
🔶 G  411 ms I  313 ms T   89 ms S    654 kB R    654 kB Dim
🔶 G  391 ms I  306 ms T   76 ms S    654 kB R    654 kB vern
🔶 G  392 ms I  303 ms T   80 ms S    654 kB R    654 kB 
🔶 G  367 ms I  258 ms T  100 ms S    654 kB R    654 kB udi
🔶 G  394 ms I  306 ms T   79 ms S    654 kB R    654 kB away
🔶 G  395 ms I  302 ms T   85 ms S    654 kB R    654 kB ~
🔶 G  393 ms I  305 ms T   80 ms S    654 kB R    654 kB ton
🔶 G  393 ms I  304 ms T   80 ms S    654 kB R    654 kB tocol
🔶 G  399 ms I  310 ms T   80 ms S    654 kB R    654 kB  coun
🔶 G  392 ms I  302 ms T   80 ms S    654 kB R    654 kB Counter
🔶 G  390 ms I  301 ms T   80 ms S    654 kB R    654 kB arts
🔶 G  391 ms I  304 ms T   78 ms S    654 kB R    654 kB A
🔶 G  374 ms I  259 ms T  107 ms S    654 kB R    654 kB ene
🔶 G  393 ms I  304 ms T   81 ms S    654 kB R    654 kB ~
Generated tokens:    64
Avg tokens / second: 2.58
Avg generation time: 387.80 ms
Avg inference time:  300.31 ms
Avg transfer time:   79.47 ms

@DifferentialityDevelopment
Copy link
Contributor

DifferentialityDevelopment commented May 25, 2024

Is your worker nodes also running the same version?

I pulled latest version from git, built from source, used downloader to download tinyllama and run as per the instructions and mine worked just fine, the only difference I could spot was that you were running using additional workers.

Possible reasons I could think of is that one or more nodes are running older versions of dllama, or some ARM specific code broke in a recent pull request, though I doubt that's the case.

The workflows test for functionality on both ARM and x86 processor architectures, though they don't exactly test the multiple worker functionality, it might be something that's broken only on multi node setup, or it could just be you didn't update the nodes to latest version..

@unclemusclez
Copy link
Author

i compile on the 3b+ and then scp it to the other 3b+.
i was downloading the tinyllama on my windows computer via WSL2 and converting it with the python env in there. the most recent time, which i justed post here, i used the python download script.
i just rm on all the dllama executeables, and then re-scp'd the executable. same result.

ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998 
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  401 ms I  315 ms T   86 ms S 466351 kB R    654 kB Tell
🔶 G  539 ms I  456 ms T   83 ms S    654 kB R    654 kB  me
🔶 G  436 ms I  325 ms T  111 ms S    654 kB R    654 kB  about
🔶 G  404 ms I  306 ms T   98 ms S    654 kB R    654 kB  yourself
🔶 G  384 ms I  306 ms T   77 ms S    654 kB R    654 kB .
🔶 G  392 ms I  304 ms T   78 ms S    654 kB R    654 kB fmt
🔶 G  391 ms I  305 ms T   77 ms S    654 kB R    654 kB ți
🔶 G  394 ms I  313 ms T   71 ms S    654 kB R    654 kB REE
🔶 G  373 ms I  269 ms T   94 ms S    654 kB R    654 kB int
🔶 G  391 ms I  309 ms T   72 ms S    654 kB R    654 kB DIS
🔶 G  397 ms I  320 ms T   67 ms S    654 kB R    654 kB  NUM
🔶 G  395 ms I  313 ms T   71 ms S    654 kB R    654 kB ART
🔶 G  396 ms I  310 ms T   75 ms S    654 kB R    654 kB  nad
🔶 G  395 ms I  304 ms T   81 ms S    654 kB R    654 kB  redirects
🔶 G  392 ms I  304 ms T   78 ms S    654 kB R    654 kB  qualified
🔶 G  393 ms I  305 ms T   79 ms S    654 kB R    654 kB help
🔶 G  365 ms I  280 ms T   80 ms S    654 kB R    654 kB  COUNT
🔶 G  376 ms I  271 ms T   94 ms S    654 kB R    654 kB is
🔶 G  394 ms I  309 ms T   75 ms S    654 kB R    654 kB T
🔶 G  396 ms I  312 ms T   76 ms S    654 kB R    654 kB npm
🔶 G  395 ms I  303 ms T   82 ms S    654 kB R    654 kB  -
🔶 G  393 ms I  310 ms T   74 ms S    654 kB R    654 kB noindent
🔶 G  391 ms I  309 ms T   73 ms S    654 kB R    654 kB ini
🔶 G  398 ms I  310 ms T   78 ms S    654 kB R    654 kB over
🔶 G  394 ms I  301 ms T   83 ms S    654 kB R    654 kB  \\
🔶 G  336 ms I  254 ms T   79 ms S    654 kB R    654 kB ve
🔶 G  379 ms I  291 ms T   77 ms S    654 kB R    654 kB  so
🔶 G  395 ms I  305 ms T   80 ms S    654 kB R    654 kB  cer
🔶 G  394 ms I  312 ms T   71 ms S    654 kB R    654 kB в
🔶 G  394 ms I  311 ms T   73 ms S    654 kB R    654 kB ~
🔶 G  394 ms I  294 ms T   91 ms S    654 kB R    654 kB on
🔶 G  395 ms I  300 ms T   84 ms S    654 kB R    654 kB ~
🔶 G  394 ms I  304 ms T   81 ms S    654 kB R    654 kB urale
🔶 G  394 ms I  308 ms T   75 ms S    654 kB R    654 kB ivers
🔶 G  324 ms I  243 ms T   77 ms S    654 kB R    654 kB jud
🔶 G  384 ms I  292 ms T   82 ms S    654 kB R    654 kB ute
🔶 G  399 ms I  316 ms T   73 ms S    654 kB R    654 kB --
🔶 G  392 ms I  306 ms T   77 ms S    654 kB R    654 kB ___
🔶 G  391 ms I  308 ms T   74 ms S    654 kB R    654 kB ~
🔶 G  395 ms I  302 ms T   84 ms S    654 kB R    654 kB ___
🔶 G  393 ms I  302 ms T   82 ms S    654 kB R    654 kB w
🔶 G  393 ms I  310 ms T   73 ms S    654 kB R    654 kB right
🔶 G  394 ms I  311 ms T   73 ms S    654 kB R    654 kB is
🔶 G  317 ms I  234 ms T   79 ms S    654 kB R    654 kB ˚
🔶 G  382 ms I  294 ms T   78 ms S    654 kB R    654 kB where
🔶 G  400 ms I  311 ms T   79 ms S    654 kB R    654 kB head
🔶 G  394 ms I  307 ms T   77 ms S    654 kB R    654 kB __
🔶 G  396 ms I  304 ms T   83 ms S    654 kB R    654 kB ----
🔶 G  395 ms I  305 ms T   80 ms S    654 kB R    654 kB ─
🔶 G  401 ms I  317 ms T   73 ms S    654 kB R    654 kB  `-
🔶 G  394 ms I  309 ms T   75 ms S    654 kB R    654 kB li
🔶 G  395 ms I  309 ms T   76 ms S    654 kB R    654 kB  from
🔶 G  307 ms I  220 ms T   83 ms S    654 kB R    654 kB __
🔶 G  384 ms I  298 ms T   77 ms S    654 kB R    654 kB idente
🔶 G  393 ms I  307 ms T   76 ms S    654 kB R    654 kB gen
🔶 G  395 ms I  315 ms T   70 ms S    654 kB R    654 kB wedge
🔶 G  394 ms I  314 ms T   71 ms S    654 kB R    654 kB unic
🔶 G  394 ms I  315 ms T   70 ms S    654 kB R    654 kB dim
🔶 G  394 ms I  307 ms T   77 ms S    654 kB R    654 kB weis
🔶 G  396 ms I  310 ms T   77 ms S    654 kB R    654 kB ligen
🔶 G  395 ms I  301 ms T   84 ms S    654 kB R    654 kB ú
🔶 G  304 ms I  224 ms T   76 ms S    654 kB R    654 kB wid
🔶 G  389 ms I  301 ms T   79 ms S    654 kB R    654 kB ute
🔶 G  396 ms I  309 ms T   78 ms S    654 kB R    654 kB w
Generated tokens:    64
Avg tokens / second: 2.57
Avg generation time: 389.53 ms
Avg inference time:  302.33 ms
Avg transfer time:   78.70 ms

@DifferentialityDevelopment
Copy link
Contributor

That's so strange, I just did a test with multiple workers, running from the same machine instead of multiple machines, though it's x86 and not ARM.

Root:
sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Love is" --workers 127.0.0.1:11211
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 2
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 8192 kB
⏩ Loaded 824584 kB
🔶 G 63 ms I 42 ms T 21 ms S 266205 kB R 93 kB Love
🔶 G 72 ms I 41 ms T 30 ms S 93 kB R 93 kB is
🔶 G 73 ms I 41 ms T 32 ms S 93 kB R 93 kB Fore
🔶 G 61 ms I 32 ms T 29 ms S 93 kB R 93 kB ver
🔶 G 63 ms I 40 ms T 22 ms S 93 kB R 93 kB ,
🔶 G 61 ms I 42 ms T 19 ms S 93 kB R 93 kB I
🔶 G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB Can
🔶 G 74 ms I 42 ms T 32 ms S 93 kB R 93 kB Only
🔶 G 70 ms I 41 ms T 28 ms S 93 kB R 93 kB Im
🔶 G 73 ms I 36 ms T 36 ms S 93 kB R 93 kB agine
🔶 G 66 ms I 46 ms T 19 ms S 93 kB R 93 kB ,
🔶 G 63 ms I 36 ms T 26 ms S 93 kB R 93 kB Jo
🔶 G 63 ms I 41 ms T 21 ms S 93 kB R 93 kB Jo
🔶 G 59 ms I 40 ms T 19 ms S 93 kB R 93 kB Gun
🔶 G 56 ms I 32 ms T 23 ms S 93 kB R 93 kB ne
🔶 G 59 ms I 34 ms T 25 ms S 93 kB R 93 kB ,
🔶 G 69 ms I 33 ms T 35 ms S 93 kB R 93 kB Jer
🔶 G 70 ms I 33 ms T 37 ms S 93 kB R 93 kB emy
🔶 G 73 ms I 32 ms T 41 ms S 93 kB R 93 kB Camp
🔶 G 77 ms I 41 ms T 36 ms S 93 kB R 93 kB ,
🔶 G 68 ms I 41 ms T 26 ms S 93 kB R 93 kB K
🔶 G 72 ms I 39 ms T 33 ms S 93 kB R 93 kB aty
🔶 G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Perry
🔶 G 77 ms I 40 ms T 37 ms S 93 kB R 93 kB ,
🔶 G 77 ms I 42 ms T 34 ms S 93 kB R 93 kB Kid
🔶 G 75 ms I 37 ms T 38 ms S 93 kB R 93 kB Rock
🔶 G 78 ms I 42 ms T 35 ms S 93 kB R 93 kB ,
🔶 G 82 ms I 41 ms T 40 ms S 93 kB R 93 kB Lady
🔶 G 82 ms I 42 ms T 40 ms S 93 kB R 93 kB An
🔶 G 70 ms I 40 ms T 30 ms S 93 kB R 93 kB te
🔶 G 74 ms I 39 ms T 35 ms S 93 kB R 93 kB bell
🔶 G 69 ms I 43 ms T 26 ms S 93 kB R 93 kB um

Worker:
./dllama worker --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --port 11211

Both running from the same machine inside WSL.

I unfortunately don't have any ARM hardware to test with currently, but it could be related to that.

@DifferentialityDevelopment
Copy link
Contributor

DifferentialityDevelopment commented May 26, 2024

Another test

sudo nice -n 20 ./dllama inference --model ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer ~/distributed-llama/models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Python is a programming language that" --workers 127.0.0.1:11211
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 2
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 8192 kB
⏩ Loaded 824584 kB
🔶 G 77 ms I 34 ms T 43 ms S 266205 kB R 93 kB Python
🔶 G 72 ms I 29 ms T 43 ms S 93 kB R 93 kB is
🔶 G 67 ms I 38 ms T 29 ms S 93 kB R 93 kB a
🔶 G 75 ms I 40 ms T 35 ms S 93 kB R 93 kB programming
🔶 G 65 ms I 32 ms T 33 ms S 93 kB R 93 kB language
🔶 G 68 ms I 40 ms T 28 ms S 93 kB R 93 kB that
🔶 G 71 ms I 39 ms T 32 ms S 93 kB R 93 kB is
🔶 G 59 ms I 42 ms T 17 ms S 93 kB R 93 kB open
🔶 G 67 ms I 30 ms T 37 ms S 93 kB R 93 kB source
🔶 G 70 ms I 34 ms T 35 ms S 93 kB R 93 kB and
🔶 G 57 ms I 43 ms T 14 ms S 93 kB R 93 kB free
🔶 G 64 ms I 46 ms T 18 ms S 93 kB R 93 kB to
🔶 G 59 ms I 46 ms T 13 ms S 93 kB R 93 kB use
🔶 G 59 ms I 38 ms T 21 ms S 93 kB R 93 kB .
🔶 G 61 ms I 47 ms T 14 ms S 93 kB R 93 kB It
🔶 G 65 ms I 35 ms T 30 ms S 93 kB R 93 kB is
🔶 G 68 ms I 42 ms T 25 ms S 93 kB R 93 kB designed
🔶 G 61 ms I 38 ms T 23 ms S 93 kB R 93 kB for
🔶 G 65 ms I 46 ms T 19 ms S 93 kB R 93 kB ease
🔶 G 61 ms I 37 ms T 24 ms S 93 kB R 93 kB of
🔶 G 75 ms I 33 ms T 42 ms S 93 kB R 93 kB use
🔶 G 71 ms I 38 ms T 33 ms S 93 kB R 93 kB ,
🔶 G 68 ms I 30 ms T 38 ms S 93 kB R 93 kB flex
🔶 G 72 ms I 36 ms T 36 ms S 93 kB R 93 kB ibility
🔶 G 73 ms I 38 ms T 35 ms S 93 kB R 93 kB and
🔶 G 71 ms I 40 ms T 30 ms S 93 kB R 93 kB efficiency
🔶 G 69 ms I 34 ms T 35 ms S 93 kB R 93 kB .

I'm going to check if I can spin up a VM on azure to test out if it's maybe an ARM specific issue.

@unclemusclez
Copy link
Author

WSL HOST:

musclez@NSA:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998
[sudo] password for musclez:
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 2048 kB
⏩ Loaded 824584 kB
🔶 G  240 ms I   11 ms T  229 ms S 466351 kB R    654 kB Tell
🔶 G  197 ms I   16 ms T  181 ms S    654 kB R    654 kB  me
🔶 G  223 ms I   13 ms T  210 ms S    654 kB R    654 kB  about
🔶 G  221 ms I   12 ms T  209 ms S    654 kB R    654 kB  yourself
🔶 G  219 ms I   10 ms T  209 ms S    654 kB R    654 kB .
🔶 G  235 ms I   11 ms T  223 ms S    654 kB R    654 kB rows
🔶 G  232 ms I   12 ms T  219 ms S    654 kB R    654 kB otti
🔶 G  232 ms I    9 ms T  223 ms S    654 kB R    654 kB where
🔶 G  266 ms I   15 ms T  250 ms S    654 kB R    654 kB otti
🔶 G  202 ms I   12 ms T  189 ms S    654 kB R    654 kB ti
🔶 G  197 ms I   13 ms T  183 ms S    654 kB R    654 kB ining
🔶 G  203 ms I   13 ms T  189 ms S    654 kB R    654 kB >
🔶 G  195 ms I   10 ms T  183 ms S    654 kB R    654 kB uden
🔶 G  199 ms I   12 ms T  187 ms S    654 kB R    654 kB  there
🔶 G  203 ms I   12 ms T  190 ms S    654 kB R    654 kB ered
🔶 G  214 ms I   12 ms T  201 ms S    654 kB R    654 kB COM
🔶 G  207 ms I    8 ms T  198 ms S    654 kB R    654 kB otti
🔶 G  210 ms I   10 ms T  199 ms S    654 kB R    654 kB otti
🔶 G  213 ms I   11 ms T  202 ms S    654 kB R    654 kB  Overflow
🔶 G  211 ms I   15 ms T  196 ms S    654 kB R    654 kB nav
🔶 G  213 ms I   13 ms T  199 ms S    654 kB R    654 kB nav
🔶 G  195 ms I   13 ms T  180 ms S    654 kB R    654 kB isti
🔶 G  204 ms I   11 ms T  191 ms S    654 kB R    654 kB  enough
🔶 G  222 ms I    9 ms T  211 ms S    654 kB R    654 kB  sigu
🔶 G  221 ms I   18 ms T  200 ms S    654 kB R    654 kB  Beginn
🔶 G  218 ms I   15 ms T  202 ms S    654 kB R    654 kB ani
🔶 G  220 ms I   14 ms T  205 ms S    654 kB R    654 kB  Overflow
🔶 G  198 ms I   12 ms T  185 ms S    654 kB R    654 kB otti
🔶 G  205 ms I   15 ms T  189 ms S    654 kB R    654 kB  Jazz
🔶 G  206 ms I   10 ms T  195 ms S    654 kB R    654 kB nu
🔶 G  197 ms I   11 ms T  186 ms S    654 kB R    654 kB лимпи
🔶 G  200 ms I   13 ms T  185 ms S    654 kB R    654 kB otti
🔶 G  194 ms I    9 ms T  184 ms S    654 kB R    654 kB  Overflow
🔶 G  204 ms I   11 ms T  191 ms S    654 kB R    654 kB {}
🔶 G  207 ms I   14 ms T  192 ms S    654 kB R    654 kB gen
🔶 G  216 ms I   18 ms T  197 ms S    654 kB R    654 kB  Overflow
🔶 G  260 ms I   14 ms T  245 ms S    654 kB R    654 kB otti
🔶 G  217 ms I    9 ms T  207 ms S    654 kB R    654 kB atti
🔶 G  219 ms I   15 ms T  203 ms S    654 kB R    654 kB  Frei
🔶 G  207 ms I   12 ms T  194 ms S    654 kB R    654 kB dk
🔶 G  232 ms I   12 ms T  219 ms S    654 kB R    654 kB  Overflow
🔶 G  213 ms I   10 ms T  203 ms S    654 kB R    654 kB  Gar
🔶 G  223 ms I   16 ms T  206 ms S    654 kB R    654 kB  Overflow
🔶 G  199 ms I   14 ms T  184 ms S    654 kB R    654 kB  Gib
🔶 G  215 ms I    9 ms T  205 ms S    654 kB R    654 kB  Hunter
🔶 G  222 ms I   10 ms T  211 ms S    654 kB R    654 kB ún
🔶 G  220 ms I    9 ms T  209 ms S    654 kB R    654 kB agu
🔶 G  220 ms I   16 ms T  203 ms S    654 kB R    654 kB  Government
🔶 G  205 ms I   10 ms T  194 ms S    654 kB R    654 kB  Overflow
🔶 G  196 ms I    9 ms T  186 ms S    654 kB R    654 kB otto
🔶 G  198 ms I   11 ms T  186 ms S    654 kB R    654 kB amps
🔶 G  222 ms I   10 ms T  211 ms S    654 kB R    654 kB  Overflow
🔶 G  200 ms I   18 ms T  180 ms S    654 kB R    654 kB  Overflow
🔶 G  195 ms I   11 ms T  183 ms S    654 kB R    654 kB  Name
🔶 G  200 ms I   12 ms T  187 ms S    654 kB R    654 kB  vis
🔶 G  209 ms I   11 ms T  197 ms S    654 kB R    654 kB  Jenkins
🔶 G  237 ms I   12 ms T  224 ms S    654 kB R    654 kB app
🔶 G  205 ms I   19 ms T  185 ms S    654 kB R    654 kB  Party
🔶 G  195 ms I   11 ms T  184 ms S    654 kB R    654 kB amps
🔶 G  209 ms I   12 ms T  196 ms S    654 kB R    654 kB  Overflow
🔶 G  202 ms I   15 ms T  186 ms S    654 kB R    654 kB  Overflow
🔶 G  212 ms I   10 ms T  201 ms S    654 kB R    654 kB Overflow
🔶 G  193 ms I   14 ms T  178 ms S    654 kB R    654 kB quipe
🔶 G  206 ms I   14 ms T  191 ms S    654 kB R    654 kB utes
Generated tokens:    64
Avg tokens / second: 4.72
Avg generation time: 212.03 ms
Avg inference time:  12.31 ms
Avg transfer time:   198.75 ms

@DifferentialityDevelopment
Copy link
Contributor

DifferentialityDevelopment commented May 26, 2024

I just created an EC2 ARM VM, and ran the same test there, worked perfectly fine.
So the issue doesn't seem to be ARM specific at the very least.
Not quite sure what is going on..

@DifferentialityDevelopment
Copy link
Contributor

Perhaps try just the WSL root node, then add workers 1 at a time, perhaps it's a problem with a single worker that's affecting the others, either way something strange is going on.

@unclemusclez
Copy link
Author

unclemusclez commented May 26, 2024

4 Work, 8 Do not. This was the same with WSL as the inference and the pi as the inference.

On WSL however, you can see that it's actually saying "overflow", when 8 are run. intriguing.

from above:

🔶 G  209 ms I   12 ms T  196 ms S    654 kB R    654 kB  Overflow
🔶 G  202 ms I   15 ms T  186 ms S    654 kB R    654 kB  Overflow
🔶 G  212 ms I   10 ms T  201 ms S    654 kB R    654 kB Overflow

4x working:

 sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB
⏩ Loaded 824584 kB
🔶 G  352 ms I   13 ms T  339 ms S 399448 kB R    280 kB Tell
🔶 G  323 ms I   12 ms T  311 ms S    280 kB R    280 kB  me
🔶 G  371 ms I   18 ms T  353 ms S    280 kB R    280 kB  about
🔶 G  344 ms I   21 ms T  322 ms S    280 kB R    280 kB  yourself
🔶 G  337 ms I   14 ms T  323 ms S    280 kB R    280 kB .
🔶 G  365 ms I   19 ms T  346 ms S    280 kB R    280 kB

🔶 G  353 ms I   14 ms T  339 ms S    280 kB R    280 kB NO
🔶 G  358 ms I   12 ms T  346 ms S    280 kB R    280 kB W
🔶 G  315 ms I   17 ms T  298 ms S    280 kB R    280 kB  A
🔶 G  344 ms I   15 ms T  329 ms S    280 kB R    280 kB BO
🔶 G  336 ms I   20 ms T  316 ms S    280 kB R    280 kB UT
🔶 G  364 ms I   12 ms T  352 ms S    280 kB R    280 kB  Y
🔶 G  336 ms I   14 ms T  322 ms S    280 kB R    280 kB OU
🔶 G  350 ms I   19 ms T  331 ms S    280 kB R    280 kB :
🔶 G  347 ms I   13 ms T  333 ms S    280 kB R    280 kB  What
🔶 G  369 ms I   13 ms T  356 ms S    280 kB R    280 kB  was
🔶 G  350 ms I   17 ms T  333 ms S    280 kB R    280 kB  your
🔶 G  404 ms I   16 ms T  388 ms S    280 kB R    280 kB  first
🔶 G  338 ms I   15 ms T  323 ms S    280 kB R    280 kB  job
🔶 G  319 ms I   14 ms T  305 ms S    280 kB R    280 kB ?
🔶 G  436 ms I   19 ms T  416 ms S    280 kB R    280 kB

🔶 G  336 ms I   22 ms T  314 ms S    280 kB R    280 kB It
🔶 G  328 ms I   16 ms T  312 ms S    280 kB R    280 kB  was
🔶 G  362 ms I   16 ms T  346 ms S    280 kB R    280 kB  a
🔶 G  342 ms I   15 ms T  327 ms S    280 kB R    280 kB  ret
🔶 G  337 ms I   14 ms T  323 ms S    280 kB R    280 kB ail
🔶 G  395 ms I   19 ms T  375 ms S    280 kB R    280 kB  job
🔶 G  343 ms I   18 ms T  325 ms S    280 kB R    280 kB ,
🔶 G  345 ms I   16 ms T  329 ms S    280 kB R    280 kB  but
🔶 G  392 ms I   20 ms T  372 ms S    280 kB R    280 kB  I
🔶 G  330 ms I   14 ms T  315 ms S    280 kB R    280 kB  was
🔶 G  401 ms I   16 ms T  385 ms S    280 kB R    280 kB  always
🔶 G  355 ms I   23 ms T  332 ms S    280 kB R    280 kB  interested
🔶 G  369 ms I   17 ms T  351 ms S    280 kB R    280 kB  in
🔶 G  409 ms I   18 ms T  390 ms S    280 kB R    280 kB  writing
🔶 G  349 ms I   15 ms T  334 ms S    280 kB R    280 kB .
🔶 G  344 ms I   17 ms T  327 ms S    280 kB R    280 kB  I
🔶 G  436 ms I   12 ms T  424 ms S    280 kB R    280 kB  read
🔶 G  333 ms I   14 ms T  319 ms S    280 kB R    280 kB  lots
🔶 G  350 ms I   18 ms T  331 ms S    280 kB R    280 kB  of
🔶 G  362 ms I   13 ms T  348 ms S    280 kB R    280 kB  books
🔶 G  359 ms I   18 ms T  341 ms S    280 kB R    280 kB  and
🔶 G  428 ms I   18 ms T  410 ms S    280 kB R    280 kB  went
🔶 G  331 ms I   15 ms T  316 ms S    280 kB R    280 kB  to
🔶 G  356 ms I   15 ms T  341 ms S    280 kB R    280 kB  university
🔶 G  383 ms I   20 ms T  363 ms S    280 kB R    280 kB  to
🔶 G  325 ms I   16 ms T  309 ms S    280 kB R    280 kB  do
🔶 G  359 ms I   12 ms T  347 ms S    280 kB R    280 kB  a
🔶 G  365 ms I   16 ms T  349 ms S    280 kB R    280 kB  B
🔶 G  322 ms I   15 ms T  306 ms S    280 kB R    280 kB A
🔶 G  349 ms I   19 ms T  330 ms S    280 kB R    280 kB  in
🔶 G  409 ms I   21 ms T  388 ms S    280 kB R    280 kB  English
🔶 G  330 ms I   14 ms T  316 ms S    280 kB R    280 kB .
🔶 G  356 ms I   13 ms T  343 ms S    280 kB R    280 kB

🔶 G  373 ms I   18 ms T  355 ms S    280 kB R    280 kB HO
🔶 G  317 ms I   14 ms T  302 ms S    280 kB R    280 kB W
🔶 G  398 ms I   14 ms T  384 ms S    280 kB R    280 kB  W
🔶 G  347 ms I   15 ms T  332 ms S    280 kB R    280 kB AS
🔶 G  332 ms I   14 ms T  318 ms S    280 kB R    280 kB  IT
🔶 G  388 ms I   22 ms T  366 ms S    280 kB R    280 kB  ME
🔶 G  349 ms I   17 ms T  332 ms S    280 kB R    280 kB ET
🔶 G  324 ms I   19 ms T  305 ms S    280 kB R    280 kB ING
🔶 G  358 ms I   13 ms T  345 ms S    280 kB R    280 kB  Y
🔶 G  345 ms I   18 ms T  327 ms S    280 kB R    280 kB OUR
Generated tokens:    64
Avg tokens / second: 2.80
Avg generation time: 356.75 ms
Avg inference time:  16.19 ms
Avg transfer time:   340.39 ms
ubuntu@ubuntu:~/distributed-llama$ sudo nice -n -20 ./dllama inference --model models/tinylama_1.1b_3t_q40/dllama_model_tinylama_1.1b_3t_q40.m --tokenizer models/tinylama_1.1b_3t_q40/dllama_tokenizer_tinylama_1.1b_3t_q40.t --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --steps 64 --prompt "Tell me about yourself." --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998
💡 arch: llama
💡 hiddenAct: silu
💡 dim: 2048
💡 hiddenDim: 5632
💡 nLayers: 22
💡 nHeads: 32
💡 nKvHeads: 4
💡 vocabSize: 32000
💡 seqLen: 2048
💡 nSlices: 4
💡 ropeTheta: 10000.0
📄 bosId: 1
📄 eosId: 2
🕒 ropeCache: 4096 kB
⏩ Loaded 824584 kB
🔶 G  506 ms I  412 ms T   94 ms S 399448 kB R    280 kB Tell
🔶 G  543 ms I  458 ms T   85 ms S    280 kB R    280 kB  me
🔶 G  486 ms I  432 ms T   54 ms S    280 kB R    280 kB  about
🔶 G  486 ms I  424 ms T   62 ms S    280 kB R    280 kB  yourself
🔶 G  488 ms I  428 ms T   60 ms S    280 kB R    280 kB .
🔶 G  493 ms I  426 ms T   62 ms S    280 kB R    280 kB

🔶 G  437 ms I  373 ms T   59 ms S    280 kB R    280 kB Over
🔶 G  486 ms I  399 ms T   82 ms S    280 kB R    280 kB all
🔶 G  525 ms I  425 ms T   95 ms S    280 kB R    280 kB ,
🔶 G  497 ms I  421 ms T   70 ms S    280 kB R    280 kB  I
🔶 G  489 ms I  426 ms T   59 ms S    280 kB R    280 kB  want
🔶 G  489 ms I  431 ms T   53 ms S    280 kB R    280 kB  to
🔶 G  494 ms I  434 ms T   55 ms S    280 kB R    280 kB  create
🔶 G  452 ms I  389 ms T   61 ms S    280 kB R    280 kB  a
🔶 G  486 ms I  395 ms T   85 ms S    280 kB R    280 kB  product
🔶 G  489 ms I  425 ms T   60 ms S    280 kB R    280 kB  that
🔶 G  491 ms I  427 ms T   59 ms S    280 kB R    280 kB  allows
🔶 G  489 ms I  423 ms T   61 ms S    280 kB R    280 kB  people
🔶 G  492 ms I  429 ms T   58 ms S    280 kB R    280 kB  to
🔶 G  492 ms I  433 ms T   54 ms S    280 kB R    280 kB  eng
🔶 G  487 ms I  425 ms T   60 ms S    280 kB R    280 kB age
🔶 G  482 ms I  377 ms T  101 ms S    280 kB R    280 kB  with
🔶 G  491 ms I  424 ms T   62 ms S    280 kB R    280 kB  nature
🔶 G  491 ms I  429 ms T   57 ms S    280 kB R    280 kB  and
🔶 G  492 ms I  430 ms T   57 ms S    280 kB R    280 kB  have
🔶 G  491 ms I  426 ms T   60 ms S    280 kB R    280 kB  a
🔶 G  490 ms I  429 ms T   57 ms S    280 kB R    280 kB  real
🔶 G  490 ms I  428 ms T   57 ms S    280 kB R    280 kB  connection
🔶 G  481 ms I  373 ms T  104 ms S    280 kB R    280 kB  with
🔶 G  498 ms I  432 ms T   62 ms S    280 kB R    280 kB  the
🔶 G  496 ms I  439 ms T   53 ms S    280 kB R    280 kB  out
🔶 G  491 ms I  430 ms T   56 ms S    280 kB R    280 kB do
🔶 G  490 ms I  434 ms T   51 ms S    280 kB R    280 kB ors
🔶 G  496 ms I  440 ms T   52 ms S    280 kB R    280 kB .
🔶 G  490 ms I  431 ms T   54 ms S    280 kB R    280 kB 

🔶 G  482 ms I  380 ms T   97 ms S    280 kB R    280 kB My
🔶 G  496 ms I  426 ms T   65 ms S    280 kB R    280 kB  main
🔶 G  492 ms I  426 ms T   61 ms S    280 kB R    280 kB  goal
🔶 G  491 ms I  431 ms T   56 ms S    280 kB R    280 kB  for
🔶 G  492 ms I  430 ms T   57 ms S    280 kB R    280 kB  the
🔶 G  498 ms I  430 ms T   63 ms S    280 kB R    280 kB  next
🔶 G  490 ms I  427 ms T   59 ms S    280 kB R    280 kB  year
🔶 G  481 ms I  374 ms T  103 ms S    280 kB R    280 kB  is
🔶 G  491 ms I  430 ms T   57 ms S    280 kB R    280 kB  to
🔶 G  491 ms I  427 ms T   59 ms S    280 kB R    280 kB  work
🔶 G  490 ms I  424 ms T   62 ms S    280 kB R    280 kB  on
🔶 G  491 ms I  429 ms T   57 ms S    280 kB R    280 kB  the
🔶 G  493 ms I  435 ms T   52 ms S    280 kB R    280 kB  R
🔶 G  492 ms I  431 ms T   56 ms S    280 kB R    280 kB ise
🔶 G  485 ms I  375 ms T  105 ms S    280 kB R    280 kB  +
🔶 G  489 ms I  429 ms T   55 ms S    280 kB R    280 kB  Fl
🔶 G  491 ms I  432 ms T   55 ms S    280 kB R    280 kB ight
🔶 G  494 ms I  435 ms T   53 ms S    280 kB R    280 kB  brand
🔶 G  496 ms I  444 ms T   48 ms S    280 kB R    280 kB .
🔶 G  492 ms I  428 ms T   60 ms S    280 kB R    280 kB  I
🔶 G  491 ms I  429 ms T   58 ms S    280 kB R    280 kB  want
🔶 G  487 ms I  374 ms T  109 ms S    280 kB R    280 kB  to
🔶 G  492 ms I  435 ms T   53 ms S    280 kB R    280 kB  create
🔶 G  492 ms I  428 ms T   60 ms S    280 kB R    280 kB  a
🔶 G  496 ms I  430 ms T   61 ms S    280 kB R    280 kB  brand
🔶 G  497 ms I  433 ms T   60 ms S    280 kB R    280 kB  that
🔶 G  493 ms I  431 ms T   57 ms S    280 kB R    280 kB  allows
🔶 G  493 ms I  436 ms T   52 ms S    280 kB R    280 kB  people
🔶 G  483 ms I  372 ms T  106 ms S    280 kB R    280 kB  to
Generated tokens:    64
Avg tokens / second: 2.04
Avg generation time: 490.73 ms
Avg inference time:  421.38 ms
Avg transfer time:   65.11 ms

@b4rtaz
Copy link
Owner

b4rtaz commented May 26, 2024

Could you try to run 8 workers but with a single thread? --nthreads 1?

@DifferentialityDevelopment
Copy link
Contributor

He could also try running funcs-test on all the Pi's

@b4rtaz
Copy link
Owner

b4rtaz commented May 26, 2024

I reproduced the problem. 8 nodes with 4 threads generate a spaggetti. I'll look at this.

⏩ Loaded 824584 kB
🔶 G 8052 ms I 4891 ms T 3161 ms S 466351 kB R    654 kB Hello
🔶 G 6765 ms I 4108 ms T 2657 ms S    654 kB R    654 kB  world
🔶 G 11431 ms I 7125 ms T 4306 ms S    654 kB R    654 kB !
🔶 G 10778 ms I 6435 ms T 4342 ms S    654 kB R    654 kB m
🔶 G 10806 ms I 6676 ms T 4130 ms S    654 kB R    654 kB row
🔶 G 12481 ms I 6907 ms T 5573 ms S    654 kB R    654 kB M
🔶 G 11464 ms I 6865 ms T 4598 ms S    654 kB R    654 kB NO

Update: The same is with 8 nodes with 1 thread:

🔶 G   62 ms I   43 ms T   19 ms S 466351 kB R    654 kB Hello
🔶 G   51 ms I   35 ms T   16 ms S    654 kB R    654 kB  world
🔶 G   46 ms I   34 ms T   12 ms S    654 kB R    654 kB !
🔶 G   48 ms I   38 ms T   10 ms S    654 kB R    654 kB  Dev
🔶 G   49 ms I   31 ms T   18 ms S    654 kB R    654 kB ori
🔶 G   50 ms I   36 ms T   13 ms S    654 kB R    654 kB IC
🔶 G   46 ms I   41 ms T    5 ms S    654 kB R    654 kB M
🔶 G   43 ms I   33 ms T   10 ms S    654 kB R    654 kB  to
🔶 G   46 ms I   33 ms T   12 ms S    654 kB R    654 kB web
🔶 G   49 ms I   38 ms T   11 ms S    654 kB R    654 kB +
🔶 G   52 ms I   32 ms T   20 ms S    654 kB R    654 kB small

Update: This problem appears with TinyLlama. Llama 3 8B works ok.

@unclemusclez
Copy link
Author

https://huggingface.co/keeeeenw/MicroLlama/tree/main

i was looking into this but there is no tokenizer.model.
I don't know enough about conversion yet. I see we're looking for the HF llama that use the sentencepiece tokenizer. that, or llama3 models.

If there was some external documentation i could refer to i would try to work with some other lightweight models that might work with the 1GB of memory.

I just got some 2GB SBCs in the mail, so i could try to mix and match a bit to allow the memory demands of Llama3.
I also may try to just use the TinyLlama with 4 Pi's. That worked, so i don't really need 8.

@b4rtaz
Copy link
Owner

b4rtaz commented May 27, 2024

@unclemusclez the mystery is solved. The TinyLama has nKvHeads=4 so this is the maximum amount of nodes now. Later I'll add some error message to the app.

@b4rtaz
Copy link
Owner

b4rtaz commented May 27, 2024

TinyLlama seems to work now, so I'm closing this issue.

@b4rtaz b4rtaz closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants