diff --git a/README.md b/README.md index b01f6f3..bb947c0 100644 --- a/README.md +++ b/README.md @@ -47,7 +47,7 @@ GGUF Parser helps in reviewing and estimating the usage of a GGUF format model w ## Notes - Since v0.7.2, GGUF Parser supports retrieving the model's metadata via split file, - which suffixes with "-00001-of-00009.gguf". + which suffixes with something like `-00001-of-00009.gguf`. - The table result `UMA` indicates the memory usage of Apple MacOS only. - Since v0.7.0, GGUF Parser is going to support estimating the usage of multiple GPUs. + The table result `RAM` means the system memory usage when @@ -105,21 +105,22 @@ $ gguf-parser --path="~/.cache/lm-studio/models/NousResearch/Hermes-2-Pro-Mistra $ # Retrieve the model's metadata via split file, $ # which needs all split files has been downloaded. +$ gguf-parser --path="~/.cache/lm-studio/models/Qwen/Qwen2-72B-Instruct-GGUF/qwen2-72b-instruct-q6_k-00001-of-00002.gguf" -+-----------------------------------------------------------------------------------------------------------+ -| MODEL | -+------------------------------+-------+--------------+---------------+------------+------------+-----------+ -| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW | -+------------------------------+-------+--------------+---------------+------------+------------+-----------+ -| Meta Llama 3.1 405B Instruct | llama | BF16 | true | 763.84 GiB | 410.08 B | 16.00 bpw | -+------------------------------+-------+--------------+---------------+------------+------------+-----------+ ++-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| MODEL | ++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+ +| NAME | ARCH | QUANTIZATION | LITTLE ENDIAN | SIZE | PARAMETERS | BPW | ++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+ +| 72b.5000B--cmix31-base100w-cpt32k_mega_v1_reflection_4_identity_2_if_ondare_beta0.09_lr_1e-6_bs128_epoch2-72B.qwen2B-bf16-mp8-pp4-lr-1e-6-minlr-1e-9-bs-128-seqlen-4096-step1350 | qwen2 | IQ1_S/Q6_K | true | 59.92 GiB | 72.71 B | 7.08 bpw | ++----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+--------------+---------------+-----------+------------+----------+ +---------------------------------------------------------------------------------------------------------------------------------------------------+ | ARCHITECTURE | +-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+ | MAX CONTEXT LEN | EMBEDDING LEN | EMBEDDING GQA | ATTENTION CAUSAL | ATTENTION HEAD CNT | LAYERS | FEED FORWARD LEN | EXPERT CNT | VOCABULARY LEN | +-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+ -| 131072 | 16384 | 8 | true | 128 | 126 | 53248 | 0 | 128256 | +| 32768 | 8192 | 8 | true | 64 | 80 | 29568 | 0 | 152064 | +-----------------+---------------+---------------+------------------+--------------------+--------+------------------+------------+----------------+ +-------------------------------------------------------------------------------------------------------------------------------------------------------+ @@ -127,18 +128,18 @@ $ # which needs all split files has been downloaded. +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ | MODEL | TOKENS SIZE | TOKENS LEN | ADDED TOKENS LEN | BOS TOKEN | EOS TOKEN | EOT TOKEN | EOM TOKEN | UNKNOWN TOKEN | SEPARATOR TOKEN | PADDING TOKEN | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -| gpt2 | 2 MiB | 128256 | N/A | 128000 | 128009 | N/A | N/A | N/A | N/A | N/A | +| gpt2 | 2.47 MiB | 152064 | N/A | 151643 | 151645 | N/A | N/A | N/A | N/A | 151643 | +-------+-------------+------------+------------------+-----------+-----------+-----------+-----------+---------------+-----------------+---------------+ -+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ESTIMATE | -+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+----------------------+ -| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | -| | | | | | | | +------------+------------+---------+------------+ -| | | | | | | | | UMA | NONUMA | UMA | NONUMA | -+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+ -| llama | 131072 | 2048 / 512 | Disabled | Supported | No | 127 (126 + 1) | Yes | 684.53 MiB | 834.53 MiB | 126 GiB | 919.55 GiB | -+-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+---------+------------+ ++---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ESTIMATE | ++-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+-------------------------+--------------------+ +| ARCH | CONTEXT SIZE | BATCH SIZE (L / P) | FLASH ATTENTION | MMAP LOAD | EMBEDDING ONLY | OFFLOAD LAYERS | FULL OFFLOADED | RAM | VRAM 0 | +| | | | | | | | +------------+------------+--------+-----------+ +| | | | | | | | | UMA | NONUMA | UMA | NONUMA | ++-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+ +| qwen2 | 32768 | 2048 / 512 | Disabled | Supported | No | 81 (80 + 1) | Yes | 307.38 MiB | 457.38 MiB | 10 GiB | 73.47 GiB | ++-------+--------------+--------------------+-----------------+-----------+----------------+----------------+----------------+------------+------------+--------+-----------+ ```