[RFC] Cross-Platform Refactor: Mac M1 support #1020

rickardp · 2024-02-03T09:43:26Z

Motivation

The newer Macs with Apple Silicon (M1 and up) are actually quite powerful and even the lowest end M1 MacBook Air are impressive. In addition, the Apple platform is very suitable for ML workloads thanks to their unified memory architecture (all system RAM can be used as GPU memory with no performance penalty).

The Apple accelerated API is called MPS (Metal Performance Shaders) and is not at all compatible with CUDA, so this requires porting all the kernels, as well as writing the stub code.

Additionally, the Mac is a very popular platform for developers. Supporting MacOS natively for the popular torch libraries (as a longer term goal) means we don't have to resort to expensive Nvidia cloud VMs for every single task.

Proposed solution

Make this library portable (ongoing effort) (see also [RFC] Cross-Platform Refactor: Overview + Link Hub #997 and [RFC] Cross-Platform Refactor: CPU-only implementation #1021)
CI for all supported targets to help contributor focus on "their" platform without risking breaking platforms they don't have access to (ongoing effort)
Make CPU implementation 100% complete ([RFC] Cross-Platform Refactor: CPU-only implementation #1021)
Write boilerplate code for starting MPS kernels (I started on this in Make native code portable and add GitHub workflow for building #949, but not complete)
Gradually implement MPS accelerated implementations of all this library's functionality. We can gradually improve the percentage of code that is accelerated.

@Titus-von-Koeller Feel free to edit this issue as you see fit, if you want a different structure for it for example.

niclimcy · 2024-02-03T14:00:05Z

There was one project that I was following that is using MPS: https://github.com/ggerganov/llama.cpp

And that was where I got the idea of using CMake for cross platform support. Not too familiar with MPS but sharing for more context.

Titus-von-Koeller · 2024-02-04T20:06:04Z

@rickardp summarized the approach:

My proposal [is]:

Support cross platform build

Build all in GitHub actions

Improve unit tests

portable bootstrapping (possibly look at if we can use PyTorch for common churns like finding DLL paths)

Then

Using a test driven approach, port function (kernel) by function

I fixed some of the “plumbing” on my branch, specifically:

Switch to cmake over pure makefiles (as cmake does a lot of the needed detections out of the box)

Builds CPU on Windows and Mac (arm64 and x64)

Builds and runs CPU tests on GitHub actions (all platforms including Mac and Windows)

Unit tests which depend on CUDA only code are skipped on non-cuda platforms, non-cuda tests now go green if CUDA is not present

MPS bootstrapping code for Apple Silicon

Would this be something you would find acceptable? If you approve of this, we could baseline on a portable build/test system then the community could work incrementally by adding MPS kernels and possibly also CPU kernels (I would actually think it would be useful for this library to be able to run on CPU only).

or would you rather have one PR where it is usable straight off the bat? (Then the community effort could go on in my fork, or somewhere else).

(I am myself more of a software engineer rather than a data scientist, so I can help out with software engineering parts of the problem (for one, this means I want a simple unit test to tell me if my kernel is working or not, rather than a higher level metric etc). Though I do know a fair share of PyTorch and GPU development so I can help out with the porting where there is a clear spec.)

Also I think the community could help out with API documentation as a way of getting the spec and expected outcome.

cchance27 · 2024-08-12T04:31:02Z

With the big usage of NF4 on flux models, seems that theres a decent demand for BNB on Apple Silicon see that most of the things above were merged already... whats the deal with support on apple so we can use the new NF4 models? We seem to be stuck on 0.42 and don't have access to the NF4 stuff... should we open a new issue for that specifically?

yiwangsimple · 2024-08-14T07:48:00Z

Hopefully Apple users will be able to use it soon

Goldilox2023 · 2024-08-15T18:44:09Z

With the big usage of NF4 on flux models, seems that theres a decent demand for BNB on Apple Silicon see that most of the things above were merged already... whats the deal with support on apple so we can use the new NF4 models? We seem to be stuck on 0.42 and don't have access to the NF4 stuff... should we open a new issue for that specifically?

THIS. I too am stuck on 0.42, so I can't install the NF4 stuff. Would be awesome if this got fixed soon. Cheers!

otrdiz · 2024-08-26T11:18:22Z

With the big usage of NF4 on flux models, seems that theres a decent demand for BNB on Apple Silicon see that most of the things above were merged already... whats the deal with support on apple so we can use the new NF4 models? We seem to be stuck on 0.42 and don't have access to the NF4 stuff... should we open a new issue for that specifically?

Quote. Can't get Flux models working on Apple Silicon since they rely mostly on NF4 stuff. Would be really nice to get this working!

bghira · 2024-09-02T11:35:37Z

why, quanto already works on apple?

M1 doesn't have BF16 support so i don't think 4bit kernels will ever work?

rickardp mentioned this issue Feb 3, 2024

Make native code portable and add GitHub workflow for building #949

Merged

Titus-von-Koeller mentioned this issue Feb 4, 2024

[RFC] Cross-Platform Refactor: Overview + Link Hub #997

Closed

Titus-von-Koeller pinned this issue Feb 4, 2024

Titus-von-Koeller unpinned this issue Feb 5, 2024

matthewdouglas mentioned this issue Feb 6, 2024

Distribute pip wheels for the architecture they are built for #1043

Closed

Verdagon mentioned this issue Mar 10, 2024

Is it possible to use AirLLM with a quantized input model? lyogavin/airllm#117

Open

srv1n mentioned this issue Aug 16, 2024

'ForgeParams4bit' object has no attribute 'quant_storage' comfyanonymous/ComfyUI_bitsandbytes_NF4#1

Open

colinux mentioned this issue Sep 27, 2024

[bug]: Flux models doesnt work on Mac M2 device. Gets an error like this -> ImportError: The bnb modules are not available. Please install bitsandbytes if available on your platform. invoke-ai/InvokeAI#6965

Open

1 task

Arcitec mentioned this issue Sep 28, 2024

new Mac/Linux launch script framework: modular, extensible and robust Nerogar/OneTrainer#477

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Cross-Platform Refactor: Mac M1 support #1020

[RFC] Cross-Platform Refactor: Mac M1 support #1020

rickardp commented Feb 3, 2024 •

edited

Loading

niclimcy commented Feb 3, 2024 •

edited

Loading

Titus-von-Koeller commented Feb 4, 2024

cchance27 commented Aug 12, 2024

yiwangsimple commented Aug 14, 2024

Goldilox2023 commented Aug 15, 2024

otrdiz commented Aug 26, 2024

bghira commented Sep 2, 2024

[RFC] Cross-Platform Refactor: Mac M1 support #1020

[RFC] Cross-Platform Refactor: Mac M1 support #1020

Comments

rickardp commented Feb 3, 2024 • edited Loading

Motivation

Proposed solution

niclimcy commented Feb 3, 2024 • edited Loading

Titus-von-Koeller commented Feb 4, 2024

cchance27 commented Aug 12, 2024

yiwangsimple commented Aug 14, 2024

Goldilox2023 commented Aug 15, 2024

otrdiz commented Aug 26, 2024

bghira commented Sep 2, 2024

rickardp commented Feb 3, 2024 •

edited

Loading

niclimcy commented Feb 3, 2024 •

edited

Loading