[BUG] `genie-t2t-run` fails when using `Llama-v3.2-3b-chat-quantized` exported for `8295P` #143

mikel-brostrom · 2024-12-18T08:54:41Z

I am trying to run a Llama-v3.2-3b-chat-quantized on a 8295P chip so I have set dsp_arch = "v66" and soc_model = 31 in htp_backend_ext_config.json accordingly. Then I installed these 2, following (https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/linux_setup.html#htp-and-dsp)

qpm-cli --license-activate hexagon8.4
qpm-cli --license-activate hexagonsdk4.x

and qualcomm_ai_engine_direct.2.28.0.241029.Linux-AnyCPU.qik using QPM which matches the QNN SDK used in the AI HUB for export.

After:

adb push genie_bundle /data/local/tmp
adb shell
cd /data/local/tmp/genie_bundle
export LD_LIBRARY_PATH=$PWD
./genie-t2t-run -c genie_config.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

I get the following error:

Using libGenie.so version 1.0.0

Missing QnnHtp field: pos-id-dim
ERROR at line 230: Failed to create the dialog config.

The contents of my genie_config.json

{
    "dialog": {
        "version": 1,
        "type": "basic",
        "context": {
            "version": 1,
            "size": 4096,
            "n-vocab": 128256,
            "bos-token": -1,
            "eos-token": 128001
        },
        "sampler": {
            "version": 1,
            "seed": 42,
            "temp": 0.8,
            "top-k": 40,
            "top-p": 0.95
        },
        "tokenizer": {
            "version": 1,
            "path": "tokenizer.json"
        },
        "engine": {
            "version": 1,
            "n-threads": 3,
            "backend": {
                "version": 1,
                "type": "QnnHtp",
                "QnnHtp": {
                    "version": 1,
                    "use-mmap": true,
                    "spill-fill-bufsize": 0,
                    "mmap-budget": 0,
                    "poll": true,
                    "cpu-mask": "0xe0",
                    "kv-dim": 128,
                    "allow-async-init": false
                },
                "extensions": "htp_backend_ext_config.json"
            },
            "model": {
                "version": 1,
                "type": "binary",
                "binary": {
                    "version": 1,
                    "ctx-bins": [
                        "llama_v3_2_3b_chat_quantized_part_1_of_3.bin",
                        "llama_v3_2_3b_chat_quantized_part_2_of_3.bin",
                        "llama_v3_2_3b_chat_quantized_part_3_of_3.bin"
                    ]
                },
                "positional-encoding": {
                    "type": "rope",
                    "rope-dim": 64,
                    "rope-theta": 500000,
                    "rope-scaling": {
                        "rope-type": "llama3",
                        "factor": 8.0,
                        "low-freq-factor": 1.0,
                        "high-freq-factor": 4.0,
                        "original-max-position-embeddings": 8192
                    }
                }
            }
        }
    }
}

Then I added "pos-id-dim": 64, to the QnnHtp block. Next error:

Using libGenie.so version 1.0.0

Unknown model config key: positional-encoding
ERROR at line 230: Failed to create the dialog config.

So I deleted this block. Now I get:

Using libGenie.so version 1.0.0

[WARN]  "Unable to initialize logging in backend extensions."
[ERROR] "Failed to create device: 1008"
[ERROR] "Device Creation failure"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.

I am missing something?

The text was updated successfully, but these errors were encountered:

gustavla · 2024-12-20T00:15:02Z

Since you posted this on Slack as well (https://qualcomm-ai-hub.slack.com/archives/C06LT6T3REY/p1734602493236869), let's continue the investigation there.

mikel-brostrom changed the title ~~genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized~~ [BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized Dec 18, 2024

mikel-brostrom changed the title ~~[BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized~~ [BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized exported for 8295P Dec 18, 2024

mikel-brostrom closed this as completed Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `genie-t2t-run` fails when using `Llama-v3.2-3b-chat-quantized` exported for `8295P` #143

[BUG] `genie-t2t-run` fails when using `Llama-v3.2-3b-chat-quantized` exported for `8295P` #143

mikel-brostrom commented Dec 18, 2024 •

edited

Loading

gustavla commented Dec 20, 2024

[BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized exported for 8295P #143

[BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized exported for 8295P #143

Comments

mikel-brostrom commented Dec 18, 2024 • edited Loading

gustavla commented Dec 20, 2024

[BUG] `genie-t2t-run` fails when using `Llama-v3.2-3b-chat-quantized` exported for `8295P` #143

[BUG] `genie-t2t-run` fails when using `Llama-v3.2-3b-chat-quantized` exported for `8295P` #143

mikel-brostrom commented Dec 18, 2024 •

edited

Loading