From a81d4a5ea0132d60aa04e102efa8e86fee16fe3a Mon Sep 17 00:00:00 2001 From: Holden X Date: Sat, 16 Dec 2023 16:42:33 +0800 Subject: [PATCH] Update demo in README.md (#6) * Update demo video in README.md * Update demo at README.md --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b0da92f..bf68d07 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,13 @@ # PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU --- -*Demo* 🔥 +## Demo 🔥 -https://github.com/hodlen/PowerInfer/assets/34213478/b782ccc8-0a2a-42b6-a6aa-07b2224a66f7 +https://github.com/SJTU-IPADS/PowerInfer/assets/34213478/d26ae05b-d0cf-40b6-8788-bda3fe447e28 -The demo is running with a single 24G 4090 GPU, the model is Falcon (ReLU)-40B, and the precision is FP16. +PowerInfer v.s. llama.cpp on a single RTX 4090(24G) running Falcon(ReLU)-40B-FP16 with a 11x speedup! + +Both PowerInfer and llama.cpp were running on the same hardware and fully utilized VRAM on RTX 4090. --- ## Abstract