Avoid division-by-zero on 0-weights #7825

CISC · 2024-06-07T22:04:34Z

Converting the new Qwen2-57B-A14B BF16->BF16 caused a lot of 0-weights (fixed in another PR) which together with an imatrix triggered NaN output on quantization due to division-by-zero.

IQ1_S crashes with this assert, fixed by #7955:

llama.cpp/ggml-quants.c

Line 13358 in 0c27e6f

GGML_ASSERT(besti1 >= 0 && besti2 >= 0 && best_shift != 0);

This PR merely fixes the division-by-zero, it does not fix Qwen2 issues as it has subnormal weights (once converted properly) that will cause NaNs on multiplication.

github-actions · 2024-06-07T23:13:01Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 531 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8848.61ms p(95)=21682.35ms fails=, finish reason: stop=477 truncated=54
Prompt processing (pp): avg=102.08tk/s p(95)=375.05tk/s
Token generation (tg): avg=31.87tk/s p(95)=47.12tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=quant-div-zero commit=0225cfbfb3e1fa9932f11ef7456b23f59ffcbc5b

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717801344 --> 1717801974
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 348.05, 348.05, 348.05, 348.05, 348.05, 777.49, 777.49, 777.49, 777.49, 777.49, 693.47, 693.47, 693.47, 693.47, 693.47, 715.26, 715.26, 715.26, 715.26, 715.26, 787.22, 787.22, 787.22, 787.22, 787.22, 807.25, 807.25, 807.25, 807.25, 807.25, 807.65, 807.65, 807.65, 807.65, 807.65, 824.2, 824.2, 824.2, 824.2, 824.2, 835.42, 835.42, 835.42, 835.42, 835.42, 850.59, 850.59, 850.59, 850.59, 850.59, 848.47, 848.47, 848.47, 848.47, 848.47, 867.67, 867.67, 867.67, 867.67, 867.67, 897.5, 897.5, 897.5, 897.5, 897.5, 871.33, 871.33, 871.33, 871.33, 871.33, 875.49, 875.49, 875.49, 875.49, 875.49, 870.74, 870.74, 870.74, 870.74, 870.74, 874.94, 874.94, 874.94, 874.94, 874.94, 872.58, 872.58, 872.58, 872.58, 872.58, 890.86, 890.86, 890.86, 890.86, 890.86, 886.23, 886.23, 886.23, 886.23, 886.23, 888.89, 888.89, 888.89, 888.89, 888.89, 892.97, 892.97, 892.97, 892.97, 892.97, 890.77, 890.77, 890.77, 890.77, 890.77, 891.25, 891.25, 891.25, 891.25, 891.25, 908.94, 908.94, 908.94, 908.94, 908.94, 906.38, 906.38, 906.38, 906.38, 906.38, 907.82, 907.82, 907.82, 907.82, 907.82, 916.64, 916.64, 916.64, 916.64, 916.64, 919.88, 919.88, 919.88, 919.88, 919.88, 918.99, 918.99, 918.99, 918.99, 918.99, 915.23, 915.23, 915.23, 915.23, 915.23, 915.56, 915.56, 915.56, 915.56, 915.56, 912.37, 912.37, 912.37, 912.37, 912.37, 911.03, 911.03, 911.03, 911.03, 911.03, 909.87, 909.87, 909.87, 909.87, 909.87, 917.43, 917.43, 917.43, 917.43, 917.43, 922.5, 922.5, 922.5, 922.5, 922.5, 925.47, 925.47, 925.47, 925.47, 925.47, 913.72, 913.72, 913.72, 913.72, 913.72, 911.26, 911.26, 911.26, 911.26, 911.26, 910.35, 910.35, 910.35, 910.35, 910.35, 910.0, 910.0, 910.0, 910.0, 910.0, 912.25, 912.25, 912.25, 912.25, 912.25, 920.21, 920.21, 920.21, 920.21, 920.21, 910.24, 910.24, 910.24, 910.24, 910.24, 910.52, 910.52, 910.52, 910.52, 910.52, 908.5, 908.5, 908.5, 908.5, 908.5, 905.36, 905.36, 905.36, 905.36, 905.36, 906.86, 906.86, 906.86, 906.86, 906.86, 907.84, 907.84, 907.84, 907.84, 907.84, 907.6, 907.6, 907.6, 907.6, 907.6, 911.24, 911.24, 911.24, 911.24, 911.24, 913.12, 913.12, 913.12, 913.12, 913.12, 915.18, 915.18, 915.18, 915.18, 915.18, 913.8, 913.8, 913.8, 913.8, 913.8, 917.34, 917.34, 917.34, 917.34, 917.34, 917.48, 917.48, 917.48, 917.48, 917.48, 917.57, 917.57, 917.57, 917.57, 917.57, 917.32, 917.32, 917.32, 917.32, 917.32, 918.15, 918.15, 918.15, 918.15, 918.15, 919.84, 919.84, 919.84, 919.84, 919.84, 919.84]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717801344 --> 1717801974
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.95, 41.95, 41.95, 41.95, 41.95, 44.72, 44.72, 44.72, 44.72, 44.72, 29.07, 29.07, 29.07, 29.07, 29.07, 30.49, 30.49, 30.49, 30.49, 30.49, 31.51, 31.51, 31.51, 31.51, 31.51, 31.53, 31.53, 31.53, 31.53, 31.53, 33.07, 33.07, 33.07, 33.07, 33.07, 33.25, 33.25, 33.25, 33.25, 33.25, 33.47, 33.47, 33.47, 33.47, 33.47, 33.33, 33.33, 33.33, 33.33, 33.33, 33.12, 33.12, 33.12, 33.12, 33.12, 32.83, 32.83, 32.83, 32.83, 32.83, 32.84, 32.84, 32.84, 32.84, 32.84, 32.59, 32.59, 32.59, 32.59, 32.59, 32.34, 32.34, 32.34, 32.34, 32.34, 32.02, 32.02, 32.02, 32.02, 32.02, 30.69, 30.69, 30.69, 30.69, 30.69, 30.72, 30.72, 30.72, 30.72, 30.72, 30.71, 30.71, 30.71, 30.71, 30.71, 30.4, 30.4, 30.4, 30.4, 30.4, 30.35, 30.35, 30.35, 30.35, 30.35, 30.37, 30.37, 30.37, 30.37, 30.37, 30.25, 30.25, 30.25, 30.25, 30.25, 30.2, 30.2, 30.2, 30.2, 30.2, 30.2, 30.2, 30.2, 30.2, 30.2, 29.8, 29.8, 29.8, 29.8, 29.8, 29.93, 29.93, 29.93, 29.93, 29.93, 30.18, 30.18, 30.18, 30.18, 30.18, 30.12, 30.12, 30.12, 30.12, 30.12, 29.99, 29.99, 29.99, 29.99, 29.99, 29.91, 29.91, 29.91, 29.91, 29.91, 30.11, 30.11, 30.11, 30.11, 30.11, 30.18, 30.18, 30.18, 30.18, 30.18, 30.17, 30.17, 30.17, 30.17, 30.17, 30.35, 30.35, 30.35, 30.35, 30.35, 30.47, 30.47, 30.47, 30.47, 30.47, 30.31, 30.31, 30.31, 30.31, 30.31, 30.25, 30.25, 30.25, 30.25, 30.25, 30.09, 30.09, 30.09, 30.09, 30.09, 29.7, 29.7, 29.7, 29.7, 29.7, 29.82, 29.82, 29.82, 29.82, 29.82, 29.98, 29.98, 29.98, 29.98, 29.98, 30.11, 30.11, 30.11, 30.11, 30.11, 30.15, 30.15, 30.15, 30.15, 30.15, 30.11, 30.11, 30.11, 30.11, 30.11, 29.99, 29.99, 29.99, 29.99, 29.99, 29.53, 29.53, 29.53, 29.53, 29.53, 28.95, 28.95, 28.95, 28.95, 28.95, 28.77, 28.77, 28.77, 28.77, 28.77, 28.79, 28.79, 28.79, 28.79, 28.79, 28.92, 28.92, 28.92, 28.92, 28.92, 28.94, 28.94, 28.94, 28.94, 28.94, 29.05, 29.05, 29.05, 29.05, 29.05, 29.06, 29.06, 29.06, 29.06, 29.06, 29.02, 29.02, 29.02, 29.02, 29.02, 29.03, 29.03, 29.03, 29.03, 29.03, 28.95, 28.95, 28.95, 28.95, 28.95, 28.97, 28.97, 28.97, 28.97, 28.97, 29.12, 29.12, 29.12, 29.12, 29.12, 29.2, 29.2, 29.2, 29.2, 29.2, 29.34, 29.34, 29.34, 29.34, 29.34, 29.39]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717801344 --> 1717801974
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.39, 0.39, 0.39, 0.39, 0.39, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16, 0.38, 0.38, 0.38, 0.38, 0.38, 0.34, 0.34, 0.34, 0.34, 0.34, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.34, 0.34, 0.34, 0.34, 0.34, 0.2, 0.2, 0.2, 0.2, 0.2, 0.18, 0.18, 0.18, 0.18, 0.18, 0.28, 0.28, 0.28, 0.28, 0.28, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.21, 0.21, 0.21, 0.21, 0.21, 0.11, 0.11, 0.11, 0.11, 0.11, 0.24, 0.24, 0.24, 0.24, 0.24, 0.32, 0.32, 0.32, 0.32, 0.32, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.2, 0.2, 0.2, 0.2, 0.2, 0.32, 0.32, 0.32, 0.32, 0.32, 0.28, 0.28, 0.28, 0.28, 0.28, 0.41, 0.41, 0.41, 0.41, 0.41, 0.23, 0.23, 0.23, 0.23, 0.23, 0.1, 0.1, 0.1, 0.1, 0.1, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.27, 0.27, 0.27, 0.27, 0.27, 0.48, 0.48, 0.48, 0.48, 0.48, 0.41, 0.41, 0.41, 0.41, 0.41, 0.41, 0.41, 0.41, 0.41, 0.41, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.29, 0.29, 0.29, 0.29, 0.29, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.19, 0.19, 0.19, 0.19, 0.19, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 531 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717801344 --> 1717801974
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0]

ggerganov · 2024-06-08T09:43:56Z

Try using #7833 instead

CISC · 2024-06-08T09:52:53Z

That helps with the NaNs, but not the 0-weights in blk.0.ffn_gate_exps.weight f.ex.

JustinLin610 · 2024-06-11T06:34:21Z

Does it work for you? I tried but still found NaN.

CISC · 2024-06-11T06:45:11Z

Does it work for you? I tried but still found NaN.

It doesn't really fix the Qwen2 issues, no, I mainly found these because BF16->BF16 flushed weights to 0, however once you fix that suml2 will occasionally be NaN and you will have division-by-nan instead. :P

…v-zero

vignesh1507

wow, didn't noticed that...thx for updating the code. appreciate the effort.

Avoid division-by-zero on 0-weights

0225cfb

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 7, 2024

CISC mentioned this pull request Jun 8, 2024

Bug: QWEN2 quantization GGML_ASSERT #7805

Closed

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 11, 2024

compilade mentioned this pull request Jun 14, 2024

Fix conversion of unnormalized BF16->BF16 weights #7843

Merged

Merge branch 'master' of github.com:ggerganov/llama.cpp into quant-di…

e0580f9

…v-zero

CISC closed this Jun 27, 2024

CISC reopened this Jun 27, 2024

vignesh1507 approved these changes Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid division-by-zero on 0-weights #7825

Avoid division-by-zero on 0-weights #7825

CISC commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024

ggerganov commented Jun 8, 2024

CISC commented Jun 8, 2024 •

edited

Loading

JustinLin610 commented Jun 11, 2024

CISC commented Jun 11, 2024

vignesh1507 left a comment

Avoid division-by-zero on 0-weights #7825

Are you sure you want to change the base?

Avoid division-by-zero on 0-weights #7825

Conversation

CISC commented Jun 7, 2024 • edited Loading

github-actions bot commented Jun 7, 2024

ggerganov commented Jun 8, 2024

CISC commented Jun 8, 2024 • edited Loading

JustinLin610 commented Jun 11, 2024

CISC commented Jun 11, 2024

vignesh1507 left a comment

Choose a reason for hiding this comment

CISC commented Jun 7, 2024 •

edited

Loading

CISC commented Jun 8, 2024 •

edited

Loading