feat: Add rocm builds and documentation #1012

cromefire · 2023-12-10T14:26:09Z

Fixes #636

Proper display in Telemetry and UI not included, see #902 for that.

wsxiaoys

Please ensure that the pull request (PR) focuses on a single part. I would suggest that you only modify the Docker-related CI configuration files (.yml/.Dockerfile) in this PR to facilitate a smoother merge.

.github/workflows/release.yml

cromefire · 2023-12-10T15:51:39Z

I would suggest that you only modify the Docker-related CI configuration files (.yml/.Dockerfile) in this PR to facilitate a smoother merge.

Well I mean documentation seems to be a pretty important part of it and I wouldn't think there are many issues there. Sure I can split it again, but I'm sinking many hours into testing and splitting this, there's a limit somewhere to what I can manage as well. This already consumed most of my Sunday now.

wsxiaoys · 2023-12-10T15:53:59Z

I would suggest that you only modify the Docker-related CI configuration files (.yml/.Dockerfile) in this PR to facilitate a smoother merge.

Well I mean documentation seems to be a pretty important part of it and I wouldn't think there are many issues there. Sure I can split it again, but I'm sinking many hours into testing and splitting this, there's a limit somewhere to what I can manage as well. This already consumed most of my Sunday now.

I fully understand and would like to express my gratitude once again for your contribution :)

I mainly refer to certain changes (e.g., Rust 1.73 -> 1.8, manylinux build). These are all valuable, but they require careful testing to ensure that everything still works.

The documentation part is fine to retain; I will provide comments once we reach a good point in the remaing part.

cromefire · 2023-12-10T15:55:24Z

I mainly refer to certain changes (e.g., Rust 1.73 -> 1.8, manylinux build). These are all valuable, but they require careful testing to ensure that everything still works.

Yeah sure I can just downgrade that again. Thought as docker uses latest stable anyway, might as well update that.

cromefire · 2023-12-10T16:14:57Z

Well the manylinux image for rocm now seems to run into a disk space limit, could you maybe try and delete some caches or so? In theory it's much smaller (< 6GB) than the cuda (~10GB) image, which is why I'm slightly confused about it... Maybe it's running an parallel on the same VM?

wsxiaoys · 2023-12-11T05:37:13Z

Well the manylinux image for rocm now seems to run into a disk space limit, could you maybe try and delete some caches or so? In theory it's much smaller (< 6GB) than the cuda (~10GB) image, which is why I'm slightly confused about it... Maybe it's running an parallel on the same VM?

I'll suggest skip manylinux build atm, and checkin docker image in this PR.

cromefire · 2023-12-12T20:17:04Z

I think it's just a fluke in the build and the build has to be restarted (as it worked before with a bigger container). Can't do that myself though.

Need to update anyway, so we'll see if it wants to run this time.

# Conflicts: # .github/workflows/release.yml # crates/llama-cpp-bindings/src/llama.rs

cromefire · 2023-12-12T22:50:51Z

Found the error: The size on docker hub is compressed and apparently ROCm is quite big, but compresses super well plus the manylinux and pytorch also added magma and MLOpen to it...

Got everything working by installing only the minimum needed on the fly. The biggest problem is that this adds ~4min to the ROCm build and I think the ROCm repo might also be slower than today sometimes...

As a solution I have prepared this: https://github.com/cromefire/hipblas-manylinux. Super basic docker build setup, but if you want me to switch to that, I'd recommend we move the repo to this org, so it's easier to maintain (basically to only thing required would be to upgrade to the latest ROCm version from time to time) as I probably won't always remember to upgrade it to the latest ROCm version.

cromefire · 2023-12-12T22:54:02Z

Also Windows build is possible as well (HIP install for windows can be found here), but I'll probably just open an issue for that, because I don't think I have the bandwidth right now to deal with windows, maybe one day...

website/docs/installation/docker.mdx

website/docs/installation/docker-compose.mdx

website/docs/faq.mdx

wsxiaoys · 2023-12-13T07:28:00Z

Hey - I reverted most docker related part - given the manylinux build now works and I’m thinking of re-organize the docker image build to be around these manylinux binaries directly - will do it as a followup from my side.

The other part LGTM, thanks for your effort!

cromefire · 2023-12-13T07:39:37Z

Yeah as said though, we probably want to transfer the image repo for ROCm to this org, so you don't have to wait on me for any upgrades and it's all under your control.

cromefire · 2023-12-13T07:41:13Z

And also please at least restore the Dockerfile themselves for manual building, as they are an easy way for people to build their own optimized images quickly. For ROCm this is important as you may have seen these AMDGPU_TARGETS, and while it includes most GPUs, it doesn't include all, so having a way for people to just build it quickly for their specific GPU would be really important, that you I always built and tested all of this as well.

cromefire · 2023-12-13T07:53:38Z

Also because everything, including compilers, is old and crusty you'd probably actually want to have an optimized build in docker which can use all the newer tools there, including the latest compiler optimizations and mitigations, after all, you don't have to worry about compatibility.

cromefire · 2023-12-13T08:04:59Z

It would also enable you to adjust the build, so it builds the ROCm libraries from source (they are open source after all), so you could build everything just for your GPU, which would probably shrink the container from like 10GB+ to more like 1-2GB.

wsxiaoys · 2023-12-13T08:07:35Z

For ROCm this is important as you may have seen these AMDGPU_TARGETS, and while it includes most GPUs, it doesn't include all, so having a way for people to just build it quickly for their specific GPU would be really important, that you I always built and tested all of this as well.

Since the user need to build and test it manually - I'll prefer just leave it in source code.

Also because everything, including compilers, is old and crusty you'd probably actually want to have an optimized build in docker which can use all the newer tools there, including the latest compiler optimizations and mitigations, after all, you don't have to worry about compatibility.

Let me think about this a bit more - but I'm still a bit leaning to just copy manylinux binary to containers to make build process simpler

cromefire · 2023-12-13T08:19:02Z

Since the user need to build and test it manually - I'll prefer just leave it in source code.

Yeah, not saying you have to base your official docker builds on it, but it'd be good to at least have the Dockerfile there for people to use to build specific builds or just for development. As said I basically never compiled it on bare metal as I initially had some build issues and in docker it just worked. (And you'd also have to install ROCm on your machine directly)

wsxiaoys · 2023-12-13T08:21:04Z

That's a fair point - essentially it served as devcontainer.

cromefire · 2023-12-13T08:21:56Z

Something like that, that's the nice thing with that, you don't need to install anything for it, you also don't even need to mess around with devcontainers, which are still a bit finicky on JetBrains IDEs, you just say docker build and it works repeatably.

Added rocm builds and documentation

5b0816c

cromefire changed the title ~~Add rocm builds and documentation~~ feat: Add rocm builds and documentation Dec 10, 2023

cromefire added 8 commits December 10, 2023 15:27

Pulled build improvements from TabbyML#902

1be9c1e

Fixed build container for rocm build

8a21760

Install git in rocm container

1e3fe32

Fixed github step

1a1d4da

Try to fix if statement

b701a63

Added more generic dependency installation

82bbf8d

upgraded rustup action

081a9f3

Update sccache

5308d34

wsxiaoys reviewed Dec 10, 2023

View reviewed changes

.github/workflows/release.yml Outdated Show resolved Hide resolved

.github/workflows/release.yml Outdated Show resolved Hide resolved

Try pytorch manylinux image

f50b5af

cromefire added 3 commits December 10, 2023 16:57

Switched location for toolchain parameter

980f3ed

Downgraded to deprecated action again

42ad479

Readded set default step

a5e69ca

cromefire added 7 commits December 12, 2023 21:21

Merge branch 'main' into rocm-release

d4d38d7

# Conflicts: # .github/workflows/release.yml # crates/llama-cpp-bindings/src/llama.rs

Install minimal rocm on the fly

5be320a

fixed typo in binary name

e89ca16

Downgraded checkout action

c96853a

Use curl to download

97fef46

Add -y flag to yum

022548c

Also install rocblas

f4e99e7

wsxiaoys added 7 commits December 13, 2023 15:13

Update website/docs/faq.mdx

81f138a

Update index.md

ab80cda

Update and rename docker-cuda.yml to docker.yml

23f2054

Delete .github/workflows/docker-rocm.yml

5202dfe

Delete rocm.Dockerfile

f3d793f

Rename cuda.Dockerfile to Dockerfile

15767a8

Update docker.yml

1ae3822

wsxiaoys reviewed Dec 13, 2023

View reviewed changes

website/docs/installation/docker.mdx Outdated Show resolved Hide resolved

Update website/docs/installation/docker.mdx

e44f48d

wsxiaoys reviewed Dec 13, 2023

View reviewed changes

website/docs/installation/docker-compose.mdx Outdated Show resolved Hide resolved

wsxiaoys added 5 commits December 13, 2023 15:21

Update website/docs/installation/docker-compose.mdx

6fc195e

Update docker-compose.mdx

2c55ee4

Update docker-compose.mdx

a1f9589

Update docker.mdx

1ed492b

Update docker.mdx

f7fe002

wsxiaoys reviewed Dec 13, 2023

View reviewed changes

website/docs/faq.mdx Outdated Show resolved Hide resolved

Update website/docs/faq.mdx

39a640b

wsxiaoys enabled auto-merge (squash) December 13, 2023 07:26

wsxiaoys disabled auto-merge December 13, 2023 07:28

wsxiaoys merged commit 6ef9040 into TabbyML:main Dec 13, 2023
8 checks passed

cromefire deleted the rocm-release branch December 22, 2023 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add rocm builds and documentation #1012

feat: Add rocm builds and documentation #1012

cromefire commented Dec 10, 2023

wsxiaoys left a comment

cromefire commented Dec 10, 2023

wsxiaoys commented Dec 10, 2023 •

edited

Loading

cromefire commented Dec 10, 2023

cromefire commented Dec 10, 2023

wsxiaoys commented Dec 11, 2023 •

edited

Loading

cromefire commented Dec 12, 2023 •

edited

Loading

cromefire commented Dec 12, 2023

cromefire commented Dec 12, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023

cromefire commented Dec 13, 2023 •

edited

Loading

cromefire commented Dec 13, 2023

cromefire commented Dec 13, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023 •

edited

Loading

feat: Add rocm builds and documentation #1012

feat: Add rocm builds and documentation #1012

Conversation

cromefire commented Dec 10, 2023

wsxiaoys left a comment

Choose a reason for hiding this comment

cromefire commented Dec 10, 2023

wsxiaoys commented Dec 10, 2023 • edited Loading

cromefire commented Dec 10, 2023

cromefire commented Dec 10, 2023

wsxiaoys commented Dec 11, 2023 • edited Loading

cromefire commented Dec 12, 2023 • edited Loading

cromefire commented Dec 12, 2023

cromefire commented Dec 12, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023

cromefire commented Dec 13, 2023 • edited Loading

cromefire commented Dec 13, 2023

cromefire commented Dec 13, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023

wsxiaoys commented Dec 13, 2023

cromefire commented Dec 13, 2023 • edited Loading

wsxiaoys commented Dec 10, 2023 •

edited

Loading

wsxiaoys commented Dec 11, 2023 •

edited

Loading

cromefire commented Dec 12, 2023 •

edited

Loading

cromefire commented Dec 13, 2023 •

edited

Loading

cromefire commented Dec 13, 2023 •

edited

Loading