Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.02 KB

README.md

File metadata and controls

30 lines (19 loc) · 1.02 KB

Web-LLM

This is an implementation of https://github.com/karpathy/llama2.c based on the excellent https://github.com/cryscan/web-rwkv project in pure Rust and WebGPU.

It is currently very slow and inefficient and is mainly a learning project and demonstration of capability.

How to use

  1. Export a model using export.py from the https://github.com/karpathy/llama2.c repository. The .pt (checkpoint) files are available from here: https://huggingface.co/karpathy/tinyllamas.
mkdir -p models/stories15M
python3 export.py --version -1 --dtype fp32 --checkpoint stories15M.pt models/stories15M
  1. Convert the huggingface pytorch_model.bin to safetensors:
python3 convert_safetensors.py --input models/stories15M/pytorch_model.bin --config models/stories15M/config.json --output models/stories15M/model.safetensors
  1. Run the model:
cargo run --release --example llama models/stories15M/model.safetensors

Credits