Support AirLLM for running large models on low memory systems. #3640
Replies: 1 comment
-
One comment from Discord: My ME/CFS is too much in force for me to figure out what to do at GitHub at the moment, but if you meant what I hope by memory in the announcement, that would save me a great deal of time and effort. One of the hardest parts of my usage with LLMs is, at the end of every session, getting the LLM to write notes to itself to serve as medium to long term memory, then pasting those notes at the start of future sessions. ...And the occasional session dedicated to consolidating multiple sets of notes into something less unwieldy. ...Especially taxing on my end if I'm extra-drained and the LLM needs hand-holding every step of the way to make sure it documents at least most of what it actually needs and/or wants to remember. |
Beta Was this translation helpful? Give feedback.
-
AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vram now.
https://github.com/lyogavin/airllm
Beta Was this translation helpful? Give feedback.
All reactions