Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: fix workflows #1035

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

wkpark
Copy link
Contributor

@wkpark wkpark commented Feb 5, 2024

fix regressions introduced by PR #949

misc workflows fixes

restored

restore some accidentally and not carefully merged but abandoned stuff.

  • support cmake -B build option correctly.
  • use python-version=3.10 for builds
  • build a wheel with all available cuda versions.
  • use conda+mamba method. disable cuda-toolkits. (the cuda-toolkits is too slow and not stable in some cases.)
  • use more flexible/customizable cmake options. (set COMPUTE_CAPABILITY for example)

fixed

misc

  • drop artifact retention-days: it could be configured Settings->Actions->General.
  • concurrency: fix restored.
  • fail-fast: false restored.
  • use ilammy/[email protected] instead of microsoft/setup-msbuild
  • use cmake -G Ninja : ninja is more faster and popular. (triton use Ninja for example)
  • differently define COMPUTE_CAPABILITY
    • for pull_request, use concise COMPUTE_CAPABILITY=50;52;60;61;62;70;72;75;80;86;87;89;90
    • for refs/tags/*, use full capabilities, COMPUTE_CAPABILITY=61;75;86;89 (only popular GPUs included)
    • with this method, build time reduced from ~40min to ~25min
  • build cuda12.1 only for pull_request, full build for refs/tags/*

aarch64 build issue

Build time using docker + aarch64 arch seems too slow. current total build time is about 30~40min.

  • normal build time is about ~5 ~10min, aarch64 build on docker is about ~25 ~30min
  • we can use cross-compiler to fix the build speed issue.
  • or we can reduce COMPUTE_CAPABILITY for pull_requests

outdated or resolved
CPU builds with aarch64 gcc/g++ are fine, but for cuda builds there is no benefit as docker itself does not have arm64 libraries installed and it will fail to link correctly. resolved.

Some of @rickardp's changes, whether intended or not, maybe better done in python-package.yml. However, it causes some serious problems, and it is important to quickly fix things that were fine before, I am submitting this PR.

P.S.: PR #1018 is not included.

Copy link

github-actions bot commented Feb 5, 2024

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, this basically reverts a lot of Rickard's work - could you explain why the changes are needed? There also seems to be a bunch of changes that aren't really "fix regression"..?

Comment on lines 4 to 5
push:
branches: [ "main" ]
Copy link
Contributor Author

@wkpark wkpark Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push needs to be done for specific branches normally. (reverted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having no branch filter here means CI gets run on pushes that don't have a PR too, so you can test things out in your own branch.

Copy link
Contributor

@rickardp rickardp Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding branch filters etc, feel free to suggest changes. The decisions were deliberate, but mine, maybe there are other considerations. My idea was that all PRs should be built here, but when creating forks you also get them built and validated. I was planning to add the concurrency part, I deliberately left it out due to the flux in pipelines and it was annoying to have the builds cancelled all the time, but I think it makes sense to add them

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@wkpark
Copy link
Contributor Author

wkpark commented Feb 5, 2024

At first glance, this basically reverts a lot of Rickard's work - could you explain why the changes are needed? There also seems to be a bunch of changes that aren't really "fix regression"..?

as I already mentioned, this is a quick and manual revert to fix the current regression state.

@akx
Copy link
Contributor

akx commented Feb 5, 2024

The missing _cpu tag might well be a regression, but this PR seems to be putting the Mamba/Conda stuff back in (why?), adding a setuptools and wheel in the shared-libs step, etc.

Wouldn't it be better to just fix things for the better instead of a "quick revert" that someone will need to fix again later? Since there's a lot of activity in this repo just now, and we're working in a state where we don't really have the luxury to actually properly test anything, I think we should accept the fact that main might be broken.

@wkpark wkpark changed the title Fix regression Fix regression + revert workflows Feb 5, 2024
@akx
Copy link
Contributor

akx commented Feb 5, 2024

Could you maybe look into other options than mamba? I mean, this repo itself has install_cuda.py and install_cuda.sh scripts which apparently are meant to install the CUDA Toolkit – maybe we should use those?

@wkpark wkpark force-pushed the fix-reg branch 2 times, most recently from 8d8747f to ece67ae Compare February 6, 2024 13:11
arch: aarch64
os: [windows-latest]
arch: [x86_64]
cuda-version: ['11.8.0', '12.1.1']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately skipped cuda 11 from pipelines not to make them slower before the dust had settled. The main branch is very active at the moment. I don't see a problem adding it, but we must realize that we burn CPU minutes quite fast now. I did not consider this a regression since Python packages were published outside of pipelines before (and still are, we only build them).

@rickardp
Copy link
Contributor

rickardp commented Feb 6, 2024

Some comment from me, possibly #949 was merged too quick without aligning on the scope. Sorry for not realizing this! May I suggest discussing improvements in the RFCs, maybe in #1031?

I would argue the easiest way to manage CUDA dependencies is through Docker. What is currently lacking is dependabot-managed update for the CUDA versions. I was actually planning to fix this in a follow up PR, it involves creating Dockerfiles for the CUDA images as this is what Dependabot can work wiith. We need to continuously monitor for updates as NVIDIA deprecates CUDA versions fast, and Dependabot is the realistic way to handle this. The problem here is Windows. While there seems to be some community-based Windows containers, AFAIK hosted runners won't run Windows containers. The community GitHub action does a decent job, but it's slow, owing to the fact that it runs the horribly slow CUDA installer. Some kind of image-based approach (just like Docker for the Linux builds) would be the best solution

.github/workflows/python-package.yml Outdated Show resolved Hide resolved
.github/workflows/python-package.yml Outdated Show resolved Hide resolved
.github/workflows/python-package.yml Outdated Show resolved Hide resolved
# Check out dependencies code
- uses: actions/checkout@v4
name: Check out NVidia cub
with:
repository: nvidia/cub
ref: 1.11.0
path: dependencies/cub
# Compile C++ code
- name: Build C++
- name: Setup Mambaforge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not convinced mamba is the way to go.

Also, there's a lot of duplication here w.r.t the Docker flow. What is the reason we need to change this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, it seems overly complicated for not much obvious benefit to me. I like to be able to do something simple like pip install -e .[dev] and use the cache on it.

.github/workflows/python-package.yml Show resolved Hide resolved
@wkpark wkpark changed the title Fix regression + revert workflows fix workflows Feb 7, 2024
@wkpark wkpark changed the title fix workflows CI: fix workflows Feb 7, 2024
@wkpark wkpark force-pushed the fix-reg branch 2 times, most recently from 8a5c804 to 95b1779 Compare February 7, 2024 15:13
@wkpark wkpark force-pushed the fix-reg branch 3 times, most recently from 66fd141 to 42942b4 Compare February 8, 2024 11:10
 - fix custom command
 - fix *_OUTPUT_DIRECTORY
 * fix to support cmake -B build option
 * add cuda 11.8, 12.1
 * use python-version==3.10 for builds.
 * fix wheel names for aarch64
 * drop artifact retention-days: it could be configured Settings->Actions->General.
 * make wheel with all available cuda versions.
 * use docker-run-actions
 * update docker image version
@Titus-von-Koeller
Copy link
Collaborator

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

Comment on lines +269 to +274
# fix wheel name
if [ "${{ matrix.arch }}" = "aarch64" ]; then
o=$(ls dist/*.whl)
n=$(echo $o | sed 's@_x86_64@_aarch64@')
[ "$n" != "$o" ] && mv $o $n
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a hack? Why is the wheel named x86_64... if it contains aarch64 material? IOW, are you sure it contains aarch64 material if it's named wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akx So to me it looks like the lib is built on aarch64 via Docker, but this step of packaging the artifacts for a wheel is being executed on an x86-64 host always.

I agree it seems hacky. Maybe using something like wheel tags $o --platform-tag=manylinux2014_aarch64 will feel a little better.

@rickardp
Copy link
Contributor

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

TBH I thought this PR was abandoned and most of the stuff had been broken out to other PRs.

I'll have another look as I am not completely sure what issues remain that this PR solves

@rickardp
Copy link
Contributor

rickardp commented Feb 21, 2024

use ilammy/[email protected] instead of microsoft/setup-msbuild
microsoft/setup-msbuild is used for msbuild. ilammy/[email protected]

concurrency: fix restored

These are already on master I think

update docker image version
fixed deprecation warning for cuda12.1 docker image (cuda12.1.0 is deprecated but cuda12.1.1 is supported)

IMHO this is better to handle like #1052 so we can have Dependabot notifying when there are upgrade

support cmake -B build option correctly.

@wkpark I think you mentioned this a while back, but I don't see why this is necessary. To me it seems that it only complicates path handling in the cmake files. Source files and output are where they are anyway and intermediates are already git ignored

we can use cross-compiler to fix the build speed issue.

CUDA can't be cross compiled but we can use native aarch64 agents when they become publically available

use python-version=3.10 for builds

IMHO we should test on all versions we support. But there was an idea floated a while back (possibly by @akx) about splitting building wheels from testing them. The building of wheels is really quick so it won't speed anything up, but it will reduce the storage use.

@rickardp
Copy link
Contributor

rickardp commented Feb 21, 2024

fix wheel names for aarch64

drop artifact retention-days: it could be configured Settings->Actions->General.

fail-fast: false restored.

I suggest we make these separate PRs. I think retention days we want to set separately per artifact type as some artifacts (wheels) are more useful than others (.so files). But agreed these need tuning

use cmake -G Ninja : ninja is more faster and popular. (triton use Ninja for example)

Same here. But I don't know if the speed of the build tool will matter, as most of the time is spent in nvcc. And it's an additional moving part. But maybe we can isolate the change and see its impact?

@wkpark
Copy link
Contributor Author

wkpark commented Feb 23, 2024

Hey @wkpark, @akx, @matthewdouglas, @rickardp,

Yes agreed, #949 was merged a bit hasty, due to an misunderstanding and too quick trigger finger on my side. Sorry for that! And thanks everyone for the collective cleanup / hotfix actions that ensued and got everything back to working order.

Based on your discussion above, I'm now not sure if this PR is still WIP or if you all agree that it's ready for review and merge? If not, what do we need to still implement or agree on in order to move forward?

several PRs have been already extracted and merged.
We can consider this PR as a reference or ideas, I do not intend to get it merged without further discussion or agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants