Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add V0 tensor layout generation #518

Merged
merged 10 commits into from
Sep 6, 2024
Merged

Add V0 tensor layout generation #518

merged 10 commits into from
Sep 6, 2024

Conversation

odjuricicTT
Copy link
Contributor

@odjuricicTT odjuricicTT commented Aug 28, 2024

A first implementation of generating possible op configurations. Starting with generating a few sharded tensor layouts for now. Much of the details here are incomplete and this is meant to unblock further optimizer and runtime development.

Waiting on #541 in order to add appropriate TensorMemoryLayout before merge.

Closes #572

layout.withMemorySpace(op->getContext(), MemorySpace::DeviceL1);

// Block Sharded
for (auto width = 2; width <= analysisInput.maxGrid.getShape()[0]; ++width) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe start with max_width/2. Same for height. Are you sure all these have chance to be valid, to you need to check/factor tensor shape first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will implement some basic tensor shape constrain checking on our end.

// L1 Interleaved (same as above)
LayoutAttr l1Interleaved =
layout.withMemorySpace(op->getContext(), MemorySpace::DeviceL1);
analysisResult.push_back(l1Interleaved);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any attribute that makes this one interleaved compared to sharded L1s below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the only difference is that interleaved does not have a gird set. Should we make this explicit in LayoutAttr?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsmithtt Any plans here, what was your intention for representing interleaved modes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnie-TT is adding flatbuffer support. Once that's in I think we can just add a new layout attribute to capture this. Or maybe we should specialize a TTNN layout class? So far it's just 1 enum, so specializing might be overkill, metal backend can just assert that this enum is always set to None or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Yeah lets go with new layout attribute.

for (auto width = 2; width <= analysisInput.maxGrid.getShape()[0]; ++width) {
for (auto height = 2; height <= analysisInput.maxGrid.getShape()[1];
++height) {
analysisResult.push_back(shardedBase.withGrid(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a shardSpec inside LayoutAttr is it also automatically updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot seem to find it. Can you point me to what you are referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at llvm::SmallVector<int64_t> LayoutAttr::getShardShape()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttrParameter<"MemRefType", "A memref that describes the physical footprint allocation of the shard. It must also have a shape with rank equal to grid.">:$memref);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's automatically updated.

}),
analysisResult.end());

// TODO: Potetialy filter out tensors that dont fit into L1 at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this for sharded ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will treat this as a nice to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, don't do this. :) We can still decide later to split the tensor for pipeline op, so this will be up to sharding policy to decide.

// Height Sharded
// TODO: Missing affine mapping to actual grid.
// TODO: Can we have every shape of 1d grid? Probably not, need to check what
// is divisible by grid sides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommended above going down to 50% of grid usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually its 25% if you go up to half of both dimensions on two-dimensional grid. :)

@odjuricicTT
Copy link
Contributor Author

Waiting on #541 to land in order to set appropriate TensorMemoryLayout.

bool LegalGridAnalysis::applyOverrides() {
// Lookup grid size overrides based on location information for current
// operation.
//

// TODO(odjuricic): We may need to infer shard type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah override needs to be extended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want all LayoutAttr params to be overridable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of linking with TT-explorer most likely yes, but let's find some good enough starting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will add this as separate issue and tackle after this.

@odjuricicTT odjuricicTT force-pushed the odjuricic/grid-analysis branch from eb0c300 to afe37ff Compare August 30, 2024 13:41
@odjuricicTT odjuricicTT marked this pull request as ready for review August 30, 2024 13:43
@odjuricicTT odjuricicTT changed the title [WIP] Add V0 tensor layout generation Add V0 tensor layout generation Aug 30, 2024
if (not llvm::isa<TTIROp>(op)) {
return;
}
// Skip operations that don't have output tensors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be used instead:
if (op->getNumResults() == 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except for ToLayoutOp which has an output.

// This implementation is a placeholder and is meant to just enable testing of
// other components.

// Skip mlir ops.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Update comment to: Process only TTIR ops.

@@ -3,7 +3,7 @@
module attributes {} {
func.func @forward(%arg0: tensor<64x128xf32>, %arg1: tensor<64x128xf32>) -> tensor<64x128xf32> {
%0 = tensor.empty() : tensor<64x128xf32>
// CHECK: #[[LAYOUT_1:.*]] = #tt.layout<(d0, d1) -> (d0, d1), undef, <8x8>, memref<8x16xf32, #dram>, interleaved>
// CHECK: #[[LAYOUT_1:.*]] = #tt.layout<(d0, d1) -> (d0, d1), undef, <1x1>, memref<64x128xf32, #dram>, interleaved>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed to 1x1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defaults to 1x1 if GridAttr is not set. My original thinking was that tensor layout grid does not make sense when the tensor is in dram. When you set 8x8 with dram you get an incorrect and confusing memref as well.

ATM i think that this value is not used at all (when dram). But might be in the coming changes to runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsmithtt @jnie-TT Do we have any plans on how to infer computation grid for OPs which operate with DRAM data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be different on an op per op basis and is discussed in #450, #559 and is being addressed in #605 .

@@ -15,6 +15,7 @@ struct LegalGridAnalysisInput {
ChipDescAttr chipDesc;
GridAttr maxGrid;
RankedTensorType tensorType;
int64_t maxShardedGrids = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constexpr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was intending for this to be a passable param. Will set to constexpr for now and change when needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it should be passed in as param, make it as part of override issue in followup change.

bool LegalGridAnalysis::applyOverrides() {
// Lookup grid size overrides based on location information for current
// operation.
//

// TODO(odjuricic): Need to override all params, not just grid size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for followup change? File an issue under Optmizier if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is an issue filed.

GridAttr::get(op->getContext(), analysisInput.maxGrid.getShape())));
// DRAM
// No grid is set since the tensor is not sharded.
// TODO(odjuricic): We need to set grid here since it will be used as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File an issue, mention special case where grid matches number of DRAM banks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the 1x1 comment above, this needs to be updated once we know how runtime and TTNN treat this param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issue.

@odjuricicTT odjuricicTT force-pushed the odjuricic/grid-analysis branch from f2efe2d to 9b91a42 Compare September 5, 2024 10:53
@odjuricicTT odjuricicTT merged commit 6eb09be into main Sep 6, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement legal Op configuration generation
3 participants