-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel Arc / XPU support #631
Comments
If I am understanding this code correctly (crates/llama-cpp-bindings/build.rs), I believe we just need a switch for OpenCL to enable Intel GPU support. If OpenCL is selected, then add the build args as described in this section of the llama.cpp docs: |
Hi @itlackey, unfortunately, I don't have an Intel Arc card to try out. If anyone has a card and is interested in giving it a try, please feel free to do so! Happy to help if any problems arise. |
I have a card but no Rust experience and basic understanding of the underlying C++ libraries. I could try adjusting the code but I'm not entirely sure of what the entire list of changes would be. Do you know if there would need additonal changes beyond altering the build args in build.rs? |
I think the first step would be following the instructions here: https://github.com/TabbyML/tabby#-contributing to make it build in your local dev environment. Then you could tune the building flags in llama-cpp-bindings’s build.rs a bit to make it compiles with opencl support. |
Sounds reasonable, I will give it a try as soon as I get a chance. |
Trying to get this working (more specifically Intel iGPU support and also ROCm support) and but after compiling it I just get a "501 Error: Not Implemented" (in Docker). No errors during build though. Any idea what went wrong? I won't be using OpenCL support, but using Intel MKL and hipBLAS, as they seem the better fit. Pretty sure there are still issues but if I can't even hit it, I can't test it. |
Put it all in a pull request here: #895 |
Nice work!! I put this on the back burner due to not being able to get decent performance using OpenCL with llama.cpp but it looks like you found a better approach. Thanks for pushing this forward! |
Well I still have to get it work, right now every sort of configuration for tabby just returns HTTP 501 |
Does llama.cpp work in the gpu with these compiler options? If not, get llama.cpp working as expected and then port that to the Tabby build settings. Skimming through the changes to Tabby, it seems like you're in the right track. |
Hello everyone, I am working on supporting intel CPU Arch by integrating intel openapi platform to tabby. Just wanted to know whether there is any update on this. I love to contribute to this for getting it done. Thanks. |
Upstream (llama.cpp) has to support it. As soon as it has that I have stuff already prepared. Alternatively if it'll be integrated faster, Vulkan compute can also be used, but the same deal, it has to be merged first. Haven't followed those PRs though, so if one of them has merged, tell me and I'll get it done. As soon as Tabby's fork of llama.cpp has been updated of course. |
I have not spent time on it in a while. I did see SYCL is now supported in llama.cpp and works well on Intel. |
Then I should probably get to it once I find a slither of time, I have SYCL already prepared pretty much. @wsxiaoys has the llama.cpp fork already been updated? |
It's updated in recent release (0.8): https://github.com/TabbyML/llama.cpp |
Hi @wsxiaoys, llama.cpp fork binding with the tabby releases (0.8/0.9) hasn't updated to the SYCL support.
Is it good to go? |
Not sure whether that's even worth it as I think the default CPU stuff already uses AVX and so on I think. |
@cromefire , will enable Intel BLAS (Intel10_64lp) at onemkl feature as Intel guideline https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html |
Would definitely still suggest to try it first if it even helps with anything and if it does maybe check for regression, because otherwise it might be easier to just use it by default rather than adding it as a "backend". 2 CPU backbends would kinda be confusing... |
* Support new feature: openapi * Change compiler to Intel llvm when compiling llama.cpp * Support Intel BLAS (Intel10_64lp)
Thanks @cromefire for your suggestion. |
Closing as vulkan support is preferred for such use cases. |
I would be great to be able to run Tabby locally on my Intel Arc GPU.
Additional context
This is currently possible in tools like llama.cpp by compiling with OpenCL support. I have no idea how that would (or could) translate to Rust.
Please reply with a 👍 if you want this feature.
The text was updated successfully, but these errors were encountered: