Skip to content

Commit

Permalink
docs: readme
Browse files Browse the repository at this point in the history
Signed-off-by: thxCode <[email protected]>
  • Loading branch information
thxCode committed Aug 21, 2024
1 parent 65a9ff8 commit 3bf700f
Showing 1 changed file with 20 additions and 19 deletions.
39 changes: 20 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ GGUF Parser helps in reviewing and estimating the usage of a GGUF format model w
## Notes

- Since v0.7.2, GGUF Parser supports retrieving the model's metadata via split file,
which suffixes with "-00001-of-00009.gguf".
which suffixes with something like `-00001-of-00009.gguf`.
- The table result `UMA` indicates the memory usage of Apple MacOS only.
- Since v0.7.0, GGUF Parser is going to support estimating the usage of multiple GPUs.
+ The table result `RAM` means the system memory usage when
Expand Down Expand Up @@ -105,40 +105,41 @@ $ gguf-parser --path="~/.cache/lm-studio/models/NousResearch/Hermes-2-Pro-Mistra

$ # Retrieve the model's metadata via split file,
$ # which needs all split files has been downloaded.
$ gguf-parser --path="~/.cache/lm-studio/models/Qwen/Qwen2-72B-Instruct-GGUF/qwen2-72b-instruct-q6_k-00001-of-00002.gguf"

+-----------------------------------------------------------------------------------------------------------+
| MODEL |
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
| Meta Llama 3.1 405B Instruct | llama | BF16 | true | 763.84 GiB | 410.08 B | 16.00 bpw |
+------------------------------+-------+--------------+---------------+------------+------------+-----------+
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MODEL |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+
| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+
| 72b.5000B--cmix31-base100w-cpt32k_mega_v1_reflection_4_identity_2_if_ondare_beta0.09_lr_1e-6_bs128_epoch2-72B.qwen2B-bf16-mp8-pp4-lr-1e-6-minlr-1e-9-bs-128-seqlen-4096-step1350 | qwen2 | IQ1_S/Q6_K | true | 59.92 GiB | 72.71 B | 7.08 bpw |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+

+---------------------------------------------------------------------------------------------------------------------------------------------------+
| ARCHITECTURE |
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
| MAX CONTEXT LEN | EMBEDDING LEN | EMBEDDING GQA | ATTENTION CAUSAL | ATTENTION HEAD CNT | LAYERS | FEED FORWARD LEN | EXPERT CNT | VOCABULARY LEN |
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+
| 131072 | 16384 | 8 | true | 128 | 126 | 53248 | 0 | 128256 |
| 32768 | 8192 | 8 | true | 64 | 80 | 29568 | 0 | 152064 |
+-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+

+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| TOKENIZER |
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
| MODEL | TOKENS SIZE | TOKENS LEN | ADDED TOKENS LEN | BOS TOKEN | EOS TOKEN | EOT TOKEN | EOM TOKEN | UNKNOWN TOKEN | SEPARATOR TOKEN | PADDING TOKEN |
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+
| gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A |
| gpt2 | 2.47 MiB | 152064 | N/A | 151643 | 151645 | N/A | N/A | N/A | N/A | 151643 |
+-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ESTIMATE |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+----------------------+
| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
| | | | | | | | +------------+------------+---------+------------+
| | | | | | | | | UMA | NONUMA | UMA | NONUMA |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+
| llama | 131072 | 2048 / 512 | Disabled | Supported | No | 127 (126 + 1) | Yes | 684.53 MiB | 834.53 MiB | 126 GiB | 919.55 GiB |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ESTIMATE |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+--------------------+
| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 |
| | | | | | | | +------------+------------+--------+-----------+
| | | | | | | | | UMA | NONUMA | UMA | NONUMA |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+
| qwen2 | 32768 | 2048 / 512 | Disabled | Supported | No | 81 (80 + 1) | Yes | 307.38 MiB | 457.38 MiB | 10 GiB | 73.47 GiB |
+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+

```

Expand Down

0 comments on commit 3bf700f

Please sign in to comment.