-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add V0 tensor layout generation #518
Conversation
layout.withMemorySpace(op->getContext(), MemorySpace::DeviceL1); | ||
|
||
// Block Sharded | ||
for (auto width = 2; width <= analysisInput.maxGrid.getShape()[0]; ++width) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe start with max_width/2. Same for height. Are you sure all these have chance to be valid, to you need to check/factor tensor shape first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will implement some basic tensor shape constrain checking on our end.
// L1 Interleaved (same as above) | ||
LayoutAttr l1Interleaved = | ||
layout.withMemorySpace(op->getContext(), MemorySpace::DeviceL1); | ||
analysisResult.push_back(l1Interleaved); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any attribute that makes this one interleaved compared to sharded L1s below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the only difference is that interleaved does not have a gird set. Should we make this explicit in LayoutAttr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsmithtt Any plans here, what was your intention for representing interleaved modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnie-TT is adding flatbuffer support. Once that's in I think we can just add a new layout attribute to capture this. Or maybe we should specialize a TTNN layout class? So far it's just 1 enum, so specializing might be overkill, metal backend can just assert that this enum is always set to None
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! Yeah lets go with new layout attribute.
for (auto width = 2; width <= analysisInput.maxGrid.getShape()[0]; ++width) { | ||
for (auto height = 2; height <= analysisInput.maxGrid.getShape()[1]; | ||
++height) { | ||
analysisResult.push_back(shardedBase.withGrid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a shardSpec inside LayoutAttr is it also automatically updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot seem to find it. Can you point me to what you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at llvm::SmallVector<int64_t> LayoutAttr::getShardShape()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttrParameter<"MemRefType", "A memref that describes the physical footprint allocation of the shard. It must also have a shape with rank equal to grid.">:$memref);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's automatically updated.
}), | ||
analysisResult.end()); | ||
|
||
// TODO: Potetialy filter out tensors that dont fit into L1 at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this for sharded ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will treat this as a nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, don't do this. :) We can still decide later to split the tensor for pipeline op, so this will be up to sharding policy to decide.
// Height Sharded | ||
// TODO: Missing affine mapping to actual grid. | ||
// TODO: Can we have every shape of 1d grid? Probably not, need to check what | ||
// is divisible by grid sides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommended above going down to 50% of grid usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually its 25% if you go up to half of both dimensions on two-dimensional grid. :)
Waiting on #541 to land in order to set appropriate TensorMemoryLayout. |
bool LegalGridAnalysis::applyOverrides() { | ||
// Lookup grid size overrides based on location information for current | ||
// operation. | ||
// | ||
|
||
// TODO(odjuricic): We may need to infer shard type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah override needs to be extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want all LayoutAttr params to be overridable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of linking with TT-explorer most likely yes, but let's find some good enough starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will add this as separate issue and tackle after this.
eb0c300
to
afe37ff
Compare
if (not llvm::isa<TTIROp>(op)) { | ||
return; | ||
} | ||
// Skip operations that don't have output tensors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be used instead:
if (op->getNumResults() == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, except for ToLayoutOp which has an output.
// This implementation is a placeholder and is meant to just enable testing of | ||
// other components. | ||
|
||
// Skip mlir ops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Update comment to: Process only TTIR ops.
@@ -3,7 +3,7 @@ | |||
module attributes {} { | |||
func.func @forward(%arg0: tensor<64x128xf32>, %arg1: tensor<64x128xf32>) -> tensor<64x128xf32> { | |||
%0 = tensor.empty() : tensor<64x128xf32> | |||
// CHECK: #[[LAYOUT_1:.*]] = #tt.layout<(d0, d1) -> (d0, d1), undef, <8x8>, memref<8x16xf32, #dram>, interleaved> | |||
// CHECK: #[[LAYOUT_1:.*]] = #tt.layout<(d0, d1) -> (d0, d1), undef, <1x1>, memref<64x128xf32, #dram>, interleaved> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed to 1x1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defaults to 1x1 if GridAttr is not set. My original thinking was that tensor layout grid does not make sense when the tensor is in dram. When you set 8x8 with dram you get an incorrect and confusing memref as well.
ATM i think that this value is not used at all (when dram). But might be in the coming changes to runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -15,6 +15,7 @@ struct LegalGridAnalysisInput { | |||
ChipDescAttr chipDesc; | |||
GridAttr maxGrid; | |||
RankedTensorType tensorType; | |||
int64_t maxShardedGrids = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was intending for this to be a passable param. Will set to constexpr for now and change when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it should be passed in as param, make it as part of override issue in followup change.
bool LegalGridAnalysis::applyOverrides() { | ||
// Lookup grid size overrides based on location information for current | ||
// operation. | ||
// | ||
|
||
// TODO(odjuricic): Need to override all params, not just grid size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for followup change? File an issue under Optmizier if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is an issue filed.
GridAttr::get(op->getContext(), analysisInput.maxGrid.getShape()))); | ||
// DRAM | ||
// No grid is set since the tensor is not sharded. | ||
// TODO(odjuricic): We need to set grid here since it will be used as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File an issue, mention special case where grid matches number of DRAM banks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the 1x1 comment above, this needs to be updated once we know how runtime and TTNN treat this param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue.
f2efe2d
to
9b91a42
Compare
A first implementation of generating possible op configurations. Starting with generating a few sharded tensor layouts for now. Much of the details here are incomplete and this is meant to unblock further optimizer and runtime development.
Waiting on #541 in order to add appropriate TensorMemoryLayout before merge.
Closes #572