Skip to content

Latest commit

 

History

History
37 lines (26 loc) · 883 Bytes

README.md

File metadata and controls

37 lines (26 loc) · 883 Bytes

llama.cpp

How to load the model

Models can be loaded from modelHub like Huggingface or object stores:

  • From Huggingface

    source:
      modelHub:
        modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
        filename: qwen2-0_5b-instruct-q5_k_m.gguf
  • From Object Store (this is for private)

    source:
      uri: oss://llmaz.oss-ap-southeast-1-internal.aliyuncs.com/models/qwen2-0_5b-instruct-q5_k_m.gguf

How to test

Once deployed successfully, you can query like this:

  • export the service: kubectl port-forward pod/qwen2-0--5b-0 8080:8080

  • run command:

    curl --request POST \
    --url http://localhost:8080/v1/completions \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'

Then you will see the outputs.