Avoid decoding long runs in a single thread #16304

gerashegalov · 2024-07-18T08:03:55Z

Description

Split long runs among threads

Lead Co-authored-by: @abellina
Co-authored-by: @gerashegalov

Benchmarks

Generated files with a single integer column comprised of logically ~1 billion rows valued 1.

4 pages. with 250 million rows per page
32 pages, with 33 million rows per page
1024 pages, 1 million row per page
4475 pages, 240 thousand rows

The benchmark Spark app iterates these files and executes

spark.read.parquet(path).selectExpr("SUM(ones)")

gpuDecodePageDataGeneric nsys the PR branch vs branch-24.10

branch	time	registers per thread	shared mem executed	theoretical occupancy	latency
24.10	1.762 s	72	32,768	87.5 %	9.632 μs
PR	1.911 s	64	65,536	100 %	10.028 μs

ncu:

branch	compute throughput	memory throughput
24.10	0.21	0.21
PR	0.24	0.24

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

dictionary Signed-off-by: Alessandro Bellina <[email protected]>

Signed-off-by: Gera Shegalov <[email protected]>

…fixed_ukernel_rlestream_24.06_rebase_load_balancing

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

…_rlestream_24.06_rebase_load_balancing' into gerashegalov/fixed_ukernel_rlestream_24.08_rebase_load_balancing

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

…v/fixed_ukernel_rlestream_24.08_rebase_load_balancing

Signed-off-by: Gera Shegalov <[email protected]>

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Signed-off-by: Gera Shegalov <[email protected]>

copy-pr-bot · 2024-07-18T08:03:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpp/src/io/parquet/decode_fixed.hpp

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Signed-off-by: Gera Shegalov <[email protected]>

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

…fixed_ukernel_rlestream_24.10_rebase_load_balancing

…alov/fixed_ukernel_rlestream_24.10_rebase_load_balancing

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

don't forget to undo

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov · 2024-09-24T02:46:08Z

Investigating a bug in this PR where batch_len goes negative on a 33M-row run:

DEBUG thread=96 warp_id=3 warp_lane=0 => batch_len=-8 negative=1 (size=33554423 remaining=96 max_count=526848, last_run_pos=526856)

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov · 2024-09-27T07:37:20Z

Negative batch fixed with 09dd99e

gerashegalov · 2024-09-27T20:00:16Z

If the benchmark is scaled 100x by replicating the the 4 pages-x-250-million file the PR branch performance drops significantly

Base wall clock for the query: 360 seconds
PR wall clock: 433 seconds

abellina and others added 16 commits May 14, 2024 11:30

rle_stream with dictionary support + micro kernels for fixed and fixed

50f8ab8

dictionary Signed-off-by: Alessandro Bellina <[email protected]>

load balancing experiment

990a849

parquet process exampel

46e8294

rebase 24.06

03f202c

Signed-off-by: Gera Shegalov <[email protected]>

suggestions 1

b11b74f

Signed-off-by: Gera Shegalov <[email protected]>

suggestions

35b00ee

Signed-off-by: Gera Shegalov <[email protected]>

Merge remote-tracking branch 'origin/branch-24.06' into gerashegalov/…

1a655fd

…fixed_ukernel_rlestream_24.06_rebase_load_balancing

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

6b0f067

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'gerashegalov/gerashegalov/fixed_ukernel…

d6cd603

…_rlestream_24.06_rebase_load_balancing' into gerashegalov/fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

0072f73

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'upstream/branch-24.08' into gerashegalo…

06b85aa

…v/fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'upstream/branch-24.08' into gerashegalo…

00a38f6

…v/fixed_ukernel_rlestream_24.08_rebase_load_balancing

Robert Maynard's patch

4c9a1ca

Signed-off-by: Gera Shegalov <[email protected]>

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

bcdf401

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

9b9e37b

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Delete process-parquet

2689e3f

Signed-off-by: Gera Shegalov <[email protected]>

gerashegalov added 4 - Needs Review Waiting for reviewer to review or respond cuIO cuIO issue Performance Performance related issue labels Jul 18, 2024

gerashegalov self-assigned this Jul 18, 2024

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 18, 2024

gerashegalov changed the title ~~Gerashegalov/fixed ukernel rlestream 24.08 rebase load balancing~~ Avoid decoding long runs in a single thread Jul 18, 2024

gerashegalov added Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change feature request New feature or request labels Jul 18, 2024

JayjeetAtGithub reviewed Jul 18, 2024

View reviewed changes

cpp/src/io/parquet/decode_fixed.hpp Outdated Show resolved Hide resolved

gerashegalov added 3 commits August 2, 2024 19:22

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

ed14133

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

review

861f9f7

Signed-off-by: Gera Shegalov <[email protected]>

Merge remote-tracking branch 'origin/branch-24.08' into gerashegalov/…

0691b6e

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

gerashegalov added 3 commits August 10, 2024 05:45

Merge remote-tracking branch 'origin/branch-24.10' into gerashegalov/…

b331766

…fixed_ukernel_rlestream_24.10_rebase_load_balancing

Merge remote-tracking branch 'origin/branch-24.10' into gerashegalov/…

772d652

…fixed_ukernel_rlestream_24.10_rebase_load_balancing

Merge commit '478406740a500ce74d8cd4b4bea07fd163256796' into gerasheg…

ce5b9e3

…alov/fixed_ukernel_rlestream_24.10_rebase_load_balancing

gerashegalov changed the base branch from branch-24.08 to branch-24.10 September 12, 2024 18:44

gerashegalov added 4 commits September 16, 2024 11:51

Merge remote-tracking branch 'origin/branch-24.10' into gerashegalov/…

f1aa5ec

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

Merge remote-tracking branch 'origin/branch-24.10' into gerashegalov/…

e5441ee

…fixed_ukernel_rlestream_24.08_rebase_load_balancing

GERA_DEBUG log

90fa87b

don't forget to undo

DEBUG LOG warp thread batch

edfed2e

Signed-off-by: Gera Shegalov <[email protected]>

Acommodate for the output offset

09dd99e

Signed-off-by: Gera Shegalov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid decoding long runs in a single thread #16304

Avoid decoding long runs in a single thread #16304

gerashegalov commented Jul 18, 2024 •

edited

Loading

copy-pr-bot bot commented Jul 18, 2024

gerashegalov commented Sep 24, 2024 •

edited

Loading

gerashegalov commented Sep 27, 2024

gerashegalov commented Sep 27, 2024

Avoid decoding long runs in a single thread #16304

Are you sure you want to change the base?

Avoid decoding long runs in a single thread #16304

Conversation

gerashegalov commented Jul 18, 2024 • edited Loading

Description

Benchmarks

Checklist

copy-pr-bot bot commented Jul 18, 2024

gerashegalov commented Sep 24, 2024 • edited Loading

gerashegalov commented Sep 27, 2024

gerashegalov commented Sep 27, 2024

gerashegalov commented Jul 18, 2024 •

edited

Loading

gerashegalov commented Sep 24, 2024 •

edited

Loading