Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac/Metal thread #3760

Closed
oobabooga opened this issue Aug 30, 2023 · 22 comments
Closed

Mac/Metal thread #3760

oobabooga opened this issue Aug 30, 2023 · 22 comments
Labels

Comments

@oobabooga
Copy link
Owner

oobabooga commented Aug 30, 2023

This thread is dedicated to discussing the setup of the webui on Metal GPUs and Mac computers in general.

You are welcome to ask questions as well as share your experiences, tips, and insights to make the process easier for all Mac users.

@GV43
Copy link

GV43 commented Sep 7, 2023

Has anyone been able to get GGFU models to load in webui? I've updated llama-cpp-python, but I'm still getting traceback errors.

@dmi
Copy link

dmi commented Sep 10, 2023

Loaded fine without issues. Installed via zip file. But does not use GPU. Standalone compiled llama.cpp uses GPU.

@cfmbrand
Copy link

cfmbrand commented Sep 11, 2023

Hi,

Fairly new to all of this, so I may be making very basic errors, but:

I installed everything through terminal per instructions in the ReadMe - this still caused me to get an error relating to cumsum and PyTorch when tried to run the model (can always load the 7B FP16 CodeLlama from TheBloke). I'm running Mac Pro 16GB (14 core) with Ventura 13.5.2, python 3.10.9 per the ReadMe instructions in a clean environment, and I installed packages per the requirements_nocuda.txt.

The error was:

To create a public link, set share=True in launch().
/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py:690: UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1678454852765/work/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
input_ids = input_ids.repeat_interleave(expand_size, dim=0)
Traceback (most recent call last):
File "/Users/appe/works/one-click-installers/text-generation-webui/modules/callbacks.py", line 71, in gentask
ret = self.mfunc(callback=_callback, **self.kwargs)
File "/Users/appe/works/one-click-installers/text-generation-webui/modules/text_generation.py", line 290, in generate_with_callback
shared.model.generate(**kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2521, in sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/Users/appe/works/one-click-installers/installer_files/env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 736, in prepare_inputs_for_generation
position_ids = attention_mask.long().cumsum(-1) - 1
RuntimeError: MPS does not support cumsum op with int64 input

I would also get this warning when running the server.py file to start up the GUI:

UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

I then tried to change the torch/torchaudio/torchvision installation to the 'nightly' version (not the stable release) - this successfully got rid of the error message, but the warning remained, and although I could run the model through the 'Default' prompt window, it now runs extremely slowly - like 0.01 tokens/second.

This is all summarised in this other thread: #1686 (comment)

Does anyone have any ideas on what is going wrong here? Thanks in advance.

@oobabooga
Copy link
Owner Author

oobabooga commented Sep 24, 2023

The updated one-click installer now installs llama.cpp wheels with Metal acceleration. They are obtained from these files:

https://github.com/oobabooga/text-generation-webui/blob/main/requirements_apple_intel.txt
https://github.com/oobabooga/text-generation-webui/blob/main/requirements_apple_silicon.txt

llama.cpp with GGUF models and n-gpu-layers set to greater than 0 should in principle work now.

@danch99
Copy link

danch99 commented Sep 27, 2023

Hi,

using the updated one-click installer and not able to install it. I'm on a Mac M2 and I get this error:

Building wheels for collected packages: exllamav2
Building wheel for exllamav2 (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [168 lines of output]
No CUDA runtime is found, using CUDA_HOME='/Users/dan/LLMs/text-generation-webui/installer_files/env'
warning: no previously-included files matching '.pyc' found anywhere in distribution
warning: no previously-included files matching 'dni_
' found anywhere in distribution
/Users/dan/LLMs/text-generation-webui/installer_files/env/lib/python3.10/site-packages/setuptools/command/build_py.py:201: _Warning: Package 'exllamav2.exllamav2_ext' is absent from the packages configuration.
!!

Thanks for your help.

@oobabooga
Copy link
Owner Author

@danch99 what version of exllamav2 is written in your requirements_apple_silicon.txt file?

@danch99
Copy link

danch99 commented Sep 27, 2023

@oobabooga
exllamav2==0.0.4

@oobabooga
Copy link
Owner Author

@jllllll do you see a reason why the new exllamav2==0.0.4 wheel would refuse to install on mac?

@philippjbauer
Copy link

I run into similar troubles installing for Apple Silicon with its requirements_apple_silicon.txt.

raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Collecting exllamav2==0.0.4 (from -r requirements_apple_silicon.txt (line 11))
  Using cached exllamav2-0.0.4.tar.gz (56 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [12 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/d4/xd3y159j1d5dcz3kdg1xqz6m0000gn/T/pip-install-e3vtw5el/exllamav2_d424aa9a25b7472682fcc4b4587265ea/setup.py", line 25, in <module>
          cpp_extension.CUDAExtension(
        File "/Users/philippbauer/.miniforge3/envs/oobabooga/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
          library_dirs += library_paths(cuda=True)
        File "/Users/philippbauer/.miniforge3/envs/oobabooga/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/Users/philippbauer/.miniforge3/envs/oobabooga/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
          raise EnvironmentError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

@jllllll
Copy link
Contributor

jllllll commented Sep 27, 2023

Probably my fault: turboderp/exllamav2#61
Completely slipped my mind that this might cause issues with the exllamav2 sdist.

4 options for resolving this:

  • Add ; platform_system != "Darwin" to the rest of the requirements.txt files.
    • It's probably sufficient to just remove exllamav2 from the Mac files since it can't be used there anyway.
  • I make a PR to have JIT compiling be the default in exllamav2 like it used to be.
  • Use the JIT compile wheel instead:
    https://github.com/turboderp/exllamav2/releases/download/v0.0.4/exllamav2-0.0.4-py3-none-any.whl
    • This option may not be viable long-term as turboderp may stop building that wheel.
  • Ask turboderp to upload the JIT compile wheel to PyPI. Currently, it is just the sdist that is uploaded there.

An immediate, temporary solution is to set the EXLLAMA_NOCOMPILE environment variable before installing on Mac.

@philippjbauer
Copy link

philippjbauer commented Sep 27, 2023

The only way I got it to run is to remove exllamav2 from the requirements_nowheels.txt file and install llama-cpp-python with CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python.

The requirements_apple_silicon.txt file does not work on macOS 14.0 (Sonoma) yet.

I suppose you will need to create a new file for the new OS version? Like this one for macOS 13.x:

https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/metal/llama_cpp_python-0.2.6-cp310-cp310-macosx_13_0_arm64.whl; platform_system == "Darwin" and platform_release >= "22.0.0" and platform_release < "23.0.0"

Nevermind, the model itself couldn't be executed when I tried the apple_silicon install file and I only noticed after trying the nowheels install file.

@turboderp
Copy link
Contributor

turboderp commented Sep 27, 2023

I'm definitely keeping the JIT version around, if nothing else then because it makes development a whole lot easier.

But it doesn't really matter to me which version is the default. I just want it to be as unsurprising as possible to the most users.

For now I have uploaded the JIT version to PyPI.

@danch99
Copy link

danch99 commented Sep 27, 2023

Thanks a lot for your fast support gentlemen. The new install works.

One small thing, I can load GGUF models but no success with GGML models.

@Falenos
Copy link

Falenos commented Oct 4, 2023

Just a tip if someone is running miniconda on M1 and has issues, check this

@goranapivis
Copy link

goranapivis commented Oct 29, 2023

Didn't work, tried every trick, lost a whole day, running LM studio instead, works out of the box with M1 GPU! :-)

@github-actions github-actions bot added the stale label Dec 10, 2023
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@b8kings0ga
Copy link

TypeError: BFloat16 is not supported on MPS

@leedrake5
Copy link

Metal no longer seems to be working? Having n_gpus > 0 used to always be the required step for GPU use on M1-M3 Macs. Now it no longer does that. Is there another setting that must be enabled? llama-cpp-python has been installed with metal flags on.

@PeterFujiyu
Copy link

The edge browser on macos has no option when using webui
image
EDGE Version Microsoft Edge
Version 121.0.2277.83 (official version) (arm64)

@mkhia
Copy link

mkhia commented Apr 2, 2024

hi! I have a problem with the MPS i am a noob and have no idea what to do with this code. MPS backend out of memory (MPS allocated: 15.16 GB, other allocations: 104.67 MB, max allowed: 18.13 GB). Tried to allocate 6.28 GB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure). can anyone help?

@robik72
Copy link

robik72 commented Apr 25, 2024

Hi, I am getting the infamous "OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root." when i try to load a gptq model on my Macbook Pro M3 with Sonoma 14.3. During setup, I choose Apple Silicon GPU but it does not seem to have an effect. I also tried to add a line to the requirements_apple_silicon.txt file with the following, with no result:
https://github.com/turboderp/exllamav2/releases/download/v0.0.4/exllamav2-0.0.4-py3-none-any.whl; platform_system != "Darwin"
What did i miss ? Glad to get your help I am really lost....

@stefanbeeman-em
Copy link

Hi, I am getting the infamous "OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root." when i try to load a gptq model on my Macbook Pro M3 with Sonoma 14.3. During setup, I choose Apple Silicon GPU but it does not seem to have an effect. I also tried to add a line to the requirements_apple_silicon.txt file with the following, with no result: https://github.com/turboderp/exllamav2/releases/download/v0.0.4/exllamav2-0.0.4-py3-none-any.whl; platform_system != "Darwin" What did i miss ? Glad to get your help I am really lost....

I'm having this issue as well. It seems like it's a problem detecting my graphics card, despite me specifying Apple sillicon in the install script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

17 participants