-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apple Silicon / MacOS support #22
Comments
That's interesting, thanks for reporting - it could be caused by a change in the nightly code. It's good that the older point release works so it's not going to be permanently broken. The next nightly might work fine, or you could try building it locally to investigate further. Are you running on an Apple ARM processor? |
Thanks, for now I've fallen back on building locally. I'm running macOS on an Apple M1 Pro. |
Glad that it builds correctly on your machine, thank you for confirming! Are you making any changes to the Dockerfile or does it just work? I would like the Apple ARM silicon to be supported, but I do not have a machine to do any building or testing on. Hopefully the next point release will be fine, and it's just the development of GGUF support that's causing some hiccups in the short term. |
Sorry, wrong button 😅 I simplified the Dockerfile, iirc there was a library (AutoGPTQ?) that was failing to install:
|
Coming here from #30 - @Atinoda mentioned this is becoming Apple M1 go-to issue (worth changing title?).
This means this package is not available for arm64 I guess. |
Thanks for posting your experiences here @AIWintermuteAI! I agree with you that a full rewrite makes sense for the M1 use-case. Beyond that, I think ROCM/AMD will require the same, and it also then makes sense to do the same for the CPU-only inference (which is more popular than I expected). I will have a think on how to refactor for that scenario, probably kicking off with |
I actually made it work, but inference took ages with llama-2-7b-chat.Q5_K_M.gguf... I'm wondering why it would be the case - perhaps before I was running a similar sized model with llama-cpp with Metal acceleration? Not sure, need to test.
|
@AIWintermuteAI Dockerfile worked for me.
It runs correctly directly on my M1 host machine, though extremely slow to respond. EDIT:
EDITEDIT:
So it loads the models alright but it never answers. It completely soaks up my CPU and never produces any output. I suspect this can be addressed with some tuning and is not an issue for this repo. |
|
It is highly unlikely that accelerated inference will ever be available via Docker on Apple Silicon. This is due to Apple's implementation of Docker, where it effectively floats on a VM with a weirdly virtualised ARM CPU that does not expose the underlying CPU/GPU capabilities. Asahi Linux may be an option in the future. However, I have an M1 Mac and I would like to use the hardware with |
For
atinoda/text-generation-webui:llama-cpu-nightly
:For reference,
atinoda/text-generation-webui:llama-cpu
works without error.The text was updated successfully, but these errors were encountered: