Skip to content

Commit

Permalink
Merge pull request #1257 from rapidsai/branch-23.10
Browse files Browse the repository at this point in the history
[RELEASE] dask-cuda v23.10
  • Loading branch information
raydouglass authored Oct 11, 2023
2 parents efbd6ca + 47ffd98 commit 73f5ee5
Show file tree
Hide file tree
Showing 26 changed files with 386 additions and 102 deletions.
4 changes: 4 additions & 0 deletions .github/copy-pr-bot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Configuration file for `copy-pr-bot` GitHub App
# https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/

enabled: true
1 change: 0 additions & 1 deletion .github/ops-bot.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ auto_merger: true
branch_checker: true
label_checker: true
release_drafter: true
copy_prs: true
recently_updated: true
10 changes: 5 additions & 5 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ concurrency:
jobs:
conda-python-build:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-build.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-build.yaml@branch-23.10
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -38,20 +38,20 @@ jobs:
if: github.ref_type == 'branch'
needs: [conda-python-build]
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/custom-job.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/custom-job.yaml@branch-23.10
with:
arch: "amd64"
branch: ${{ inputs.branch }}
build_type: ${{ inputs.build_type || 'branch' }}
container_image: "rapidsai/ci:latest"
container_image: "rapidsai/ci-conda:latest"
date: ${{ inputs.date }}
node_type: "gpu-v100-latest-1"
run_script: "ci/build_docs.sh"
sha: ${{ inputs.sha }}
upload-conda:
needs: [conda-python-build]
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/conda-upload-packages.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/conda-upload-packages.yaml@branch-23.10
with:
build_type: ${{ inputs.build_type || 'branch' }}
branch: ${{ inputs.branch }}
Expand All @@ -60,7 +60,7 @@ jobs:
wheel-build:
runs-on: ubuntu-latest
container:
image: rapidsai/ci:latest
image: rapidsai/ci-conda:latest
defaults:
run:
shell: bash
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,37 @@ jobs:
- docs-build
- wheel-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/pr-builder.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/pr-builder.yaml@branch-23.10
checks:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/checks.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/checks.yaml@branch-23.10
conda-python-build:
needs: checks
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-build.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-build.yaml@branch-23.10
with:
build_type: pull-request
conda-python-tests:
needs: conda-python-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-tests.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-tests.yaml@branch-23.10
with:
build_type: pull-request
docs-build:
needs: conda-python-build
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/custom-job.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/custom-job.yaml@branch-23.10
with:
build_type: pull-request
node_type: "gpu-v100-latest-1"
arch: "amd64"
container_image: "rapidsai/ci:latest"
container_image: "rapidsai/ci-conda:latest"
run_script: "ci/build_docs.sh"
wheel-build:
needs: checks
runs-on: ubuntu-latest
container:
image: rapidsai/ci:latest
image: rapidsai/ci-conda:latest
defaults:
run:
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ on:
jobs:
conda-python-tests:
secrets: inherit
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-tests.yaml@branch-23.08
uses: rapidsai/shared-action-workflows/.github/workflows/conda-python-tests.yaml@branch-23.10
with:
build_type: nightly
branch: ${{ inputs.branch }}
Expand Down
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
# dask-cuda 23.10.00 (11 Oct 2023)

## 🐛 Bug Fixes

- Monkeypatch protocol.loads ala dask/distributed#8216 ([#1247](https://github.com/rapidsai/dask-cuda/pull/1247)) [@wence-](https://github.com/wence-)
- Explicit-comms: preserve partition IDs ([#1240](https://github.com/rapidsai/dask-cuda/pull/1240)) [@madsbk](https://github.com/madsbk)
- Increase test timeouts further to reduce CI failures ([#1234](https://github.com/rapidsai/dask-cuda/pull/1234)) [@pentschev](https://github.com/pentschev)
- Use `conda mambabuild` not `mamba mambabuild` ([#1231](https://github.com/rapidsai/dask-cuda/pull/1231)) [@bdice](https://github.com/bdice)
- Increate timeouts of tests that frequently timeout in CI ([#1228](https://github.com/rapidsai/dask-cuda/pull/1228)) [@pentschev](https://github.com/pentschev)
- Adapt to non-string task keys in distributed ([#1225](https://github.com/rapidsai/dask-cuda/pull/1225)) [@wence-](https://github.com/wence-)
- Update `test_worker_timeout` ([#1223](https://github.com/rapidsai/dask-cuda/pull/1223)) [@pentschev](https://github.com/pentschev)
- Avoid importing `loads_function` from distributed ([#1220](https://github.com/rapidsai/dask-cuda/pull/1220)) [@rjzamora](https://github.com/rjzamora)

## 🚀 New Features

- Enable maximum pool size for RMM async allocator ([#1221](https://github.com/rapidsai/dask-cuda/pull/1221)) [@pentschev](https://github.com/pentschev)

## 🛠️ Improvements

- Pin `dask` and `distributed` for `23.10` release ([#1251](https://github.com/rapidsai/dask-cuda/pull/1251)) [@galipremsagar](https://github.com/galipremsagar)
- Update `test_spill.py` to avoid `FutureWarning`s ([#1243](https://github.com/rapidsai/dask-cuda/pull/1243)) [@pentschev](https://github.com/pentschev)
- Remove obsolete pytest `filterwarnings` ([#1241](https://github.com/rapidsai/dask-cuda/pull/1241)) [@pentschev](https://github.com/pentschev)
- Update image names ([#1233](https://github.com/rapidsai/dask-cuda/pull/1233)) [@AyodeAwe](https://github.com/AyodeAwe)
- Use `copy-pr-bot` ([#1227](https://github.com/rapidsai/dask-cuda/pull/1227)) [@ajschmidt8](https://github.com/ajschmidt8)
- Unpin `dask` and `distributed` for `23.10` development ([#1222](https://github.com/rapidsai/dask-cuda/pull/1222)) [@galipremsagar](https://github.com/galipremsagar)

# dask-cuda 23.08.00 (9 Aug 2023)

## 🐛 Bug Fixes
Expand Down
2 changes: 1 addition & 1 deletion ci/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ rapids-mamba-retry install \
--channel "${PYTHON_CHANNEL}" \
dask-cuda

export RAPIDS_VERSION_NUMBER="23.08"
export RAPIDS_VERSION_NUMBER="23.10"
export RAPIDS_DOCS_DIR="$(mktemp -d)"

rapids-logger "Build Python docs"
Expand Down
2 changes: 1 addition & 1 deletion ci/build_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ rapids-print-env

rapids-logger "Begin py build"

rapids-mamba-retry mambabuild \
rapids-conda-retry mambabuild \
conda/recipes/dask-cuda

rapids-upload-conda-to-s3 python
1 change: 1 addition & 0 deletions ci/test_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ UCX_WARN_UNUSED_ENV_VARS=n \
UCX_MEMTYPE_CACHE=n \
timeout 40m pytest \
-vv \
--durations=0 \
--capture=no \
--cache-clear \
--junitxml="${RAPIDS_TESTS_DIR}/junit-dask-cuda.xml" \
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/dask-cuda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ requirements:
- tomli
run:
- python
- dask-core ==2023.7.1
- dask-core ==2023.9.2
{% for r in data.get("project", {}).get("dependencies", []) %}
- {{ r }}
{% endfor %}
Expand Down
3 changes: 2 additions & 1 deletion dask_cuda/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@
from .local_cuda_cluster import LocalCUDACluster
from .proxify_device_objects import proxify_decorator, unproxify_decorator

__version__ = "23.08.00"
__version__ = "23.10.00"

from . import compat

# Monkey patching Dask to make use of explicit-comms when `DASK_EXPLICIT_COMMS=True`
dask.dataframe.shuffle.rearrange_by_column = get_rearrange_by_column_wrapper(
Expand Down
118 changes: 118 additions & 0 deletions dask_cuda/compat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
import pickle

import msgpack
from packaging.version import Version

import dask
import distributed
import distributed.comm.utils
import distributed.protocol
from distributed.comm.utils import OFFLOAD_THRESHOLD, nbytes, offload
from distributed.protocol.core import (
Serialized,
decompress,
logger,
merge_and_deserialize,
msgpack_decode_default,
msgpack_opts,
)

if Version(distributed.__version__) >= Version("2023.8.1"):
# Monkey-patch protocol.core.loads (and its users)
async def from_frames(
frames, deserialize=True, deserializers=None, allow_offload=True
):
"""
Unserialize a list of Distributed protocol frames.
"""
size = False

def _from_frames():
try:
# Patched code
return loads(
frames, deserialize=deserialize, deserializers=deserializers
)
# end patched code
except EOFError:
if size > 1000:
datastr = "[too large to display]"
else:
datastr = frames
# Aid diagnosing
logger.error("truncated data stream (%d bytes): %s", size, datastr)
raise

if allow_offload and deserialize and OFFLOAD_THRESHOLD:
size = sum(map(nbytes, frames))
if (
allow_offload
and deserialize
and OFFLOAD_THRESHOLD
and size > OFFLOAD_THRESHOLD
):
res = await offload(_from_frames)
else:
res = _from_frames()

return res

def loads(frames, deserialize=True, deserializers=None):
"""Transform bytestream back into Python value"""

allow_pickle = dask.config.get("distributed.scheduler.pickle")

try:

def _decode_default(obj):
offset = obj.get("__Serialized__", 0)
if offset > 0:
sub_header = msgpack.loads(
frames[offset],
object_hook=msgpack_decode_default,
use_list=False,
**msgpack_opts,
)
offset += 1
sub_frames = frames[offset : offset + sub_header["num-sub-frames"]]
if deserialize:
if "compression" in sub_header:
sub_frames = decompress(sub_header, sub_frames)
return merge_and_deserialize(
sub_header, sub_frames, deserializers=deserializers
)
else:
return Serialized(sub_header, sub_frames)

offset = obj.get("__Pickled__", 0)
if offset > 0:
sub_header = msgpack.loads(frames[offset])
offset += 1
sub_frames = frames[offset : offset + sub_header["num-sub-frames"]]
# Patched code
if "compression" in sub_header:
sub_frames = decompress(sub_header, sub_frames)
# end patched code
if allow_pickle:
return pickle.loads(
sub_header["pickled-obj"], buffers=sub_frames
)
else:
raise ValueError(
"Unpickle on the Scheduler isn't allowed, "
"set `distributed.scheduler.pickle=true`"
)

return msgpack_decode_default(obj)

return msgpack.loads(
frames[0], object_hook=_decode_default, use_list=False, **msgpack_opts
)

except Exception:
logger.critical("Failed to deserialize", exc_info=True)
raise

distributed.protocol.loads = loads
distributed.protocol.core.loads = loads
distributed.comm.utils.from_frames = from_frames
8 changes: 6 additions & 2 deletions dask_cuda/device_host_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import time

import numpy
from zict import Buffer, File, Func
from zict import Buffer, Func
from zict.common import ZictBase

import dask
Expand All @@ -17,6 +17,7 @@
serialize_bytelist,
)
from distributed.sizeof import safe_sizeof
from distributed.spill import CustomFile as KeyAsStringFile
from distributed.utils import nbytes

from .is_device_object import is_device_object
Expand Down Expand Up @@ -201,7 +202,10 @@ def __init__(
self.disk_func = Func(
_serialize_bytelist,
deserialize_bytes,
File(self.disk_func_path),
# Task keys are not strings, so this takes care of
# converting arbitrary tuple keys into a string before
# handing off to zict.File
KeyAsStringFile(self.disk_func_path),
)

host_buffer_kwargs = {}
Expand Down
4 changes: 1 addition & 3 deletions dask_cuda/explicit_comms/comms.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from typing import Any, Dict, Hashable, Iterable, List, Optional

import distributed.comm
from dask.utils import stringify
from distributed import Client, Worker, default_client, get_worker
from distributed.comm.addressing import parse_address, parse_host_port, unparse_address

Expand Down Expand Up @@ -305,8 +304,7 @@ def stage_keys(self, name: str, keys: Iterable[Hashable]) -> Dict[int, set]:
dict
dict that maps each worker-rank to the workers set of staged keys
"""
key_set = {stringify(k) for k in keys}
return dict(self.run(_stage_keys, name, key_set))
return dict(self.run(_stage_keys, name, set(keys)))


def pop_staging_area(session_state: dict, name: str) -> Dict[str, Any]:
Expand Down
Loading

0 comments on commit 73f5ee5

Please sign in to comment.