Support Simple protocol plans using scratch buffer #371

yzygitzh · 2024-10-22T13:50:24Z

Separate the options whether to use packet and whether to use double buffer.
Reduce primitive implementation fixes in executor.
Add validation for number of operations, channels and channels per operation.

src/executor/executor.cc

Binyang2014 · 2024-10-24T09:38:45Z

src/executor/executor.cc

+      plan.impl_->lightLoadExecutionPlan(inputMessageSize, outputMessageSize, contsSrcOffset, constDstOffset, rank,
+                                         sendBufferSize, recvBufferSize, flag);
      this->setupDeviceExecutionPlan(this->contexts[key], rank, plan);
      this->contexts[key].deviceExecutionPlansBuffer =
          allocExtSharedCuda<char>(this->contexts[key].deviceExecutionPlans.size() * sizeof(DeviceExecutionPlan));


If find the key, seems it will overwrite the previous plan. It may cause problem when using cuda graph. Only last plan will be copied to GPU memory. Maybe we need to distinguish the plan with inputMessageSize, outputMessageSize, flag. And let kernel load the plan with correct device_ptr.

yzygitzh added 5 commits October 22, 2024 13:47

Add separate option for double scratch buffer

96a6d56

fix bug

59d0917

fix bug

ab77c07

fix instruction naming

0d43a21

fix reduce primitive and plan validation

aaff57f

yzygitzh changed the title ~~Add separate option for double scratch buffer~~ Support Simple protocol plans using scratch buffer Oct 24, 2024

chhwang approved these changes Oct 24, 2024

View reviewed changes

src/executor/executor.cc Show resolved Hide resolved

fix scratch offset calculation for packet

0683623

Binyang2014 reviewed Oct 24, 2024

View reviewed changes

Merge branch 'microsoft:main' into ziyue/fix-double-buffer

32777df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Simple protocol plans using scratch buffer #371

Support Simple protocol plans using scratch buffer #371

yzygitzh commented Oct 22, 2024 •

edited

Loading

Binyang2014 Oct 24, 2024

Support Simple protocol plans using scratch buffer #371

Are you sure you want to change the base?

Support Simple protocol plans using scratch buffer #371

Conversation

yzygitzh commented Oct 22, 2024 • edited Loading

Binyang2014 Oct 24, 2024

Choose a reason for hiding this comment

yzygitzh commented Oct 22, 2024 •

edited

Loading