Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized exported for 8295P #143

Closed
mikel-brostrom opened this issue Dec 18, 2024 · 1 comment

Comments

@mikel-brostrom
Copy link

mikel-brostrom commented Dec 18, 2024

I am trying to run a Llama-v3.2-3b-chat-quantized on a 8295P chip so I have set dsp_arch = "v66" and soc_model = 31 in htp_backend_ext_config.json accordingly. Then I installed these 2, following (https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/linux_setup.html#htp-and-dsp)

qpm-cli --license-activate hexagon8.4
qpm-cli --license-activate hexagonsdk4.x

and qualcomm_ai_engine_direct.2.28.0.241029.Linux-AnyCPU.qik using QPM which matches the QNN SDK used in the AI HUB for export.

After:

adb push genie_bundle /data/local/tmp
adb shell
cd /data/local/tmp/genie_bundle
export LD_LIBRARY_PATH=$PWD
./genie-t2t-run -c genie_config.json -p "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWhat is France's capital?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"

I get the following error:

Using libGenie.so version 1.0.0

Missing QnnHtp field: pos-id-dim
ERROR at line 230: Failed to create the dialog config.

The contents of my genie_config.json

{
    "dialog": {
        "version": 1,
        "type": "basic",
        "context": {
            "version": 1,
            "size": 4096,
            "n-vocab": 128256,
            "bos-token": -1,
            "eos-token": 128001
        },
        "sampler": {
            "version": 1,
            "seed": 42,
            "temp": 0.8,
            "top-k": 40,
            "top-p": 0.95
        },
        "tokenizer": {
            "version": 1,
            "path": "tokenizer.json"
        },
        "engine": {
            "version": 1,
            "n-threads": 3,
            "backend": {
                "version": 1,
                "type": "QnnHtp",
                "QnnHtp": {
                    "version": 1,
                    "use-mmap": true,
                    "spill-fill-bufsize": 0,
                    "mmap-budget": 0,
                    "poll": true,
                    "cpu-mask": "0xe0",
                    "kv-dim": 128,
                    "allow-async-init": false
                },
                "extensions": "htp_backend_ext_config.json"
            },
            "model": {
                "version": 1,
                "type": "binary",
                "binary": {
                    "version": 1,
                    "ctx-bins": [
                        "llama_v3_2_3b_chat_quantized_part_1_of_3.bin",
                        "llama_v3_2_3b_chat_quantized_part_2_of_3.bin",
                        "llama_v3_2_3b_chat_quantized_part_3_of_3.bin"
                    ]
                },
                "positional-encoding": {
                    "type": "rope",
                    "rope-dim": 64,
                    "rope-theta": 500000,
                    "rope-scaling": {
                        "rope-type": "llama3",
                        "factor": 8.0,
                        "low-freq-factor": 1.0,
                        "high-freq-factor": 4.0,
                        "original-max-position-embeddings": 8192
                    }
                }
            }
        }
    }
}

Then I added "pos-id-dim": 64, to the QnnHtp block. Next error:

Using libGenie.so version 1.0.0

Unknown model config key: positional-encoding
ERROR at line 230: Failed to create the dialog config.

So I deleted this block. Now I get:

Using libGenie.so version 1.0.0

[WARN]  "Unable to initialize logging in backend extensions."
[ERROR] "Failed to create device: 1008"
[ERROR] "Device Creation failure"
Failure to initialize model
ERROR at line 234: Failed to create the dialog.

I am missing something?

@mikel-brostrom mikel-brostrom changed the title genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized [BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized Dec 18, 2024
@mikel-brostrom mikel-brostrom changed the title [BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized [BUG] genie-t2t-run fails when using Llama-v3.2-3b-chat-quantized exported for 8295P Dec 18, 2024
@gustavla
Copy link

Since you posted this on Slack as well (https://qualcomm-ai-hub.slack.com/archives/C06LT6T3REY/p1734602493236869), let's continue the investigation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants