diff --git a/README.md b/README.md index bc68815..d74180b 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ PowerInfer v.s. llama.cpp on a single RTX 4090(24G) running Falcon(ReLU)-40B-FP1 Both PowerInfer and llama.cpp were running on the same hardware and fully utilized VRAM on RTX 4090. -> [!NOTE] +> [!News] > **Live Demo Online⚡️** > > Try out our [Gradio server](https://powerinfer-gradio.vercel.app/) hosting Falcon(ReLU)-40B-FP16 on a RTX 4090!