Web-LLM

This is an implementation of https://github.com/karpathy/llama2.c based on the excellent https://github.com/cryscan/web-rwkv project in pure Rust and WebGPU.

It is currently very slow and inefficient and is mainly a learning project and demonstration of capability.

How to use

Export a model using export.py from the https://github.com/karpathy/llama2.c repository. The .pt (checkpoint) files are available from here: https://huggingface.co/karpathy/tinyllamas.

mkdir -p models/stories15M
python3 export.py --version -1 --dtype fp32 --checkpoint stories15M.pt models/stories15M

Convert the huggingface pytorch_model.bin to safetensors:

python3 convert_safetensors.py --input models/stories15M/pytorch_model.bin --config models/stories15M/config.json --output models/stories15M/model.safetensors

Run the model:

cargo run --release --example llama models/stories15M/model.safetensors

Credits

Based on the https://github.com/cryscan/web-rwkv and uses their design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web-LLM

How to use

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web-LLM

How to use

Credits