This repository has been archived by the owner on Nov 25, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 37
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-24.08 into branch-24.10
Forward-merge branch-24.08 into branch-24.10
It looks like the `Dockerfile` in this repo is fairly old (PyTorch 22.10). I don't know if it is useful -- we have largely deleted Dockerfiles in each RAPIDS repo now that we have devcontainers. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/linhu-nv - Brad Rees (https://github.com/BradReesWork) URL: #184
Allow users to specify the entry size on each rank. node_feat_wm_embedding = wgth.create_embedding( ... embedding_entry_partition=[283071, 401722, 356680, 329221, 238065, 238060, 217897, 384313] ) 1. embedding_entry_partition[i] indicates the number of embedding entries stored on the rank i. 2. If embedding_entry_partition is None, embedding will be partitioned equally. 3. Only chunked device and distributed host/device are supported. Authors: - https://github.com/zhuofan1123 Approvers: - https://github.com/linhu-nv - Brad Rees (https://github.com/BradReesWork) URL: #194
…rsion (#203) Contributes to rapidsai/build-planning#58. `scikit-build-core==0.10.0` was released today (https://github.com/scikit-build/scikit-build-core/releases/tag/v0.10.0), and wheel-building configurations across RAPIDS are incompatible with it. This proposes upgrading to that version and fixing configuration here in a way that: * is compatible with that new `scikit-build-core` version * takes advantage of the forward-compatibility mechanism (`minimum-version`) that `scikit-build-core` provides, to reduce the risk of needing to do this again in the future Authors: - James Lamb (https://github.com/jameslamb) Approvers: - https://github.com/jakirkham URL: #203
We have many users running the [Kubeflow training operator](https://github.com/kubeflow/training-operator) who are also interested in using Wholegraph. For our MPIJobs users, many of them still use [HorovodRun](https://github.com/horovod/horovod/tree/master) as the startup command. Therefore, we want to add HorovodRun as one of the Wholegraph launch agents so our users can use Wholegraph on top of Kubeflow. The new function will be similar to the existing MPI launcher agent, where the horovod library is only imported on demand. The horovod.tensorflow library will be used solely for the Horovod initialization command due to the issue with horovod.torch (see horovod/horovod#4009). After the Horovod initialization, the program can continue to run normal PyTorch code within each rank just like the mpi4py. fixes #201 Authors: - Tommy Li (https://github.com/Tomcli) Approvers: - https://github.com/linhu-nv - Brad Rees (https://github.com/BradReesWork) URL: #200
A few small tweaks to `update-version.sh` for alignment across RAPIDS. This PR removes the `UCX_PY` version HTTP call from `update-version.sh` because it is not used. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #204
This PR updates pre-commit hooks to the latest versions that are supported without causing style check errors. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #206
Authors: - Chuang Zhu (https://github.com/chuangz0) Approvers: - https://github.com/linhu-nv - Brad Rees (https://github.com/BradReesWork) URL: #207
Contributes to rapidsai/build-planning#88 Finishes the work of dropping Python 3.9 support. This project stopped building / testing against Python 3.9 as of rapidsai/shared-workflows#235. This PR updates configuration and docs to reflect that. ## Notes for Reviewers ### How I tested this Checked that there were no remaining uses like this: ```shell git grep -E '3\.9' git grep '39' git grep 'py39' ``` And similar for variations on Python 3.8 (to catch things that were missed the last time this was done). Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #209
This PR removes the NumPy<2 pin. `wholegraph` does not appear to be a heavy user of NumPy or CuPy, so it should be fine to simply remove the pin. For other RAPIDS projects with heavier dependency, CuPy 13.3.0 was required (just released) to have sufficient good CuPy/NumPy interoperability. Authors: - Sebastian Berg (https://github.com/seberg) Approvers: - https://github.com/jakirkham URL: #208
This PR updates rapidsai/pre-commit-hooks to the version 0.4.0. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #213
Contributes to rapidsai/build-planning#40 This PR adds support for Python 3.12. ## Notes for Reviewers This is part of ongoing work to add Python 3.12 support across RAPIDS. It temporarily introduces a build/test matrix including Python 3.12, from rapidsai/shared-workflows#213. A follow-up PR will revert back to pointing at the `branch-24.10` branch of `shared-workflows` once all RAPIDS repos have added Python 3.12 support. ### This will fail until all dependencies have been updates to Python 3.12 CI here is expected to fail until all of this project's upstream dependencies support Python 3.12. This can be merged whenever all CI jobs are passing. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #214
Just adds the existing license to the `pylibwholegraph` conda recipe. Authors: - Ray Douglass (https://github.com/raydouglass) Approvers: - James Lamb (https://github.com/jameslamb) URL: #215
Contributes to rapidsai/build-planning#102 Fixes #217 ## Notes for Reviewers ### How I tested this Temporarily added a CUDA 11.4.3 test job to CI here (the same specs as the failing nightly), by pointing at the branch from rapidsai/shared-workflows#246. Observed the exact same failures with CUDA 11.4 reported in rapidsai/build-planning#102. ```text ... + nccl 2.10.3.1 hcad2f07_0 rapidsai-nightly 125MB ... ./WHOLEGRAPH_CSR_WEIGHTED_SAMPLE_WITHOUT_REPLACEMENT_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit sh -c exec "$0" ./WHOLEMEMORY_HANDLE_TEST ./WHOLEMEMORY_HANDLE_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libwholegraph/../../../lib/libwholegraph.so: undefined symbol: ncclCommSplit sh -c exec "$0" ./GRAPH_APPEND_UNIQUE_TEST ``` ([build link](https://github.com/rapidsai/wholegraph/actions/runs/10966022370/job/30453393224?pr=218)) Pushed a commit adding a floor of `nccl>=2.18.1.1`. Saw all tests pass with CUDA 11.4 😁 ```text ... + nccl 2.22.3.1 hee583db_1 conda-forge 131MB ... (various log messages showing all tests passed) ``` ([build link](https://github.com/rapidsai/wholegraph/actions/runs/10966210441/job/30454147250?pr=218)) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/linhu-nv - https://github.com/jakirkham URL: #218
Follow-up to #218 This bumps the NCCL floor here slightly higher, to `>=2.19`. Part of a RAPIDS-wide update of that floor for the 24.10 release. See rapidsai/build-planning#102 (comment) for context. cc @linhu-nv for awareness Authors: - James Lamb (https://github.com/jameslamb) Approvers: - https://github.com/jakirkham URL: #223
raydouglass
requested review from
KyleFromNVIDIA
and removed request for
a team
October 4, 2024 19:46
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-24.10
and v24.10 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-24.10
until release (merging of this PR).What is the purpose of this PR?
branch-24.10
intomain
for the release