Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap metal runtime #263

Merged
merged 6 commits into from
Aug 2, 2024
Merged

Bootstrap metal runtime #263

merged 6 commits into from
Aug 2, 2024

Conversation

nsmithtt
Copy link
Contributor

No description provided.

@nsmithtt nsmithtt force-pushed the nsmith/metal-runtime2 branch from 4bbf261 to 3f6b21b Compare August 1, 2024 17:56
@kmabeeTT
Copy link
Contributor

kmabeeTT commented Aug 1, 2024

Thanks, Nick for putting this together! I am taking a look at this...

Copy link
Contributor

@kmabeeTT kmabeeTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, mostly questions for my own benefit. The most interesting stuff was in new files runtime.cpp and command_queue.cpp, but I guess we don't support some important stuff like EnqueueReadBuffer/WriteBuffer yet.

@@ -1,4 +1,4 @@
// RUN: ttmlir-opt --ttir-layout --ttnn-open-device --convert-ttir-to-ttnn %s > %t.mlir
// RUN: ttmlir-opt --ttir-implicit-device --ttir-layout --ttnn-open-device --convert-ttir-to-ttnn %s > %t.mlir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what changed to require this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This binds a default ttir device to the enclosing module. The translate command below relies on a device being bound during flatbuffer serialization.

@@ -175,6 +177,8 @@ Event submit(Device deviceHandle, Binary executableHandle,
return Event(nullptr);
}

void wait(Event) { throw std::runtime_error("Not implemented"); }
void wait(Event) {
// Not implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to get rid of the runtime error that was here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because otherwise a FE runtime cannot generically use this interface. It's perfectly legal for a backend runtime to implement submit as blocking and wait is a nop. Which is how ttnn runtime works today, but we should fix this and use real ttnn events and nonblocking APIs

}

root_type TTMetalBinary;
file_identifier "TTB0";
file_extension "ttb";
file_identifier "TTM0";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the 0 suffix here? Don't see it on TTNN/binary.fbs sibling file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be a 4 character code, I figured it might be useful one day to put the major version here or something? No good reason, it's arbitrary.

@@ -3,39 +3,64 @@ include "program.fbs";
namespace tt.target.metal;


table DispatchCommand {
table EnqueueProgramCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get far enough to see how these are used or understand if this is initial/bringup limitation, but all these new commands seem rather...bare... (?)... it makes me wonder if they need to more closely match host_api functions in terms of number of arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They appear somewhat bare, but most of them hold some sort of Desc or Ref, with an underlying Desc that fans out to quite a bit of metadata.

@@ -29,6 +31,28 @@ inline ::tt::target::OOBVal toFlatbuffer(FlatbufferObjectCache &,
}
}

inline std::uint64_t getElementSizeBytes(DataType dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, looks like we have same function in runtime/utils.h called dataTypeElementSize(), perhaps they should be commonized?

Copy link
Contributor Author

@nsmithtt nsmithtt Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear your point, and I might be swayed one way or the other, but currently isn't any shared code between compiler and runtime except for FBS. It is somewhat deliberate in that it might be good to leave the door open for one day where the flatbuffer runtime grows out of scope of just tt-mlir and becomes its own repo.

@@ -102,6 +104,74 @@ std::vector<TensorDesc> getProgramOutputs(Flatbuffer binary,

} // namespace ttnn

namespace metal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this stuff (from eyeball check) is quite common with ttnn versions above? Do we keep them separate because they will soon diverge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look the same, but the flatbuffer generated functions they call are different and flatbuffer controls the binary layout so they really cannot be mixed. This can be a subtle gotcha.

return ::tt::target::metal::GetSizePrefixedTTMetalBinary(binary.handle.get());
}

std::pair<SystemDesc, DeviceIds> getCurrentSystemDesc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, should we put TODO/comment with words for temporary hacks/missing functionality? Not seeing much of that in our tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but they should link to an issue. getCurrentSystemDesc I think is being cleaned up by Jackson

::tt::target::Dim2d deviceGrid = toFlatbuffer(device->logical_grid_size());
std::vector<::flatbuffers::Offset<tt::target::ChipDesc>> chipDescs = {
::tt::target::CreateChipDesc(
fbb, toFlatbuffer(device->arch()), &deviceGrid, (1 << 20), 12,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like ttnn is the same, but will these hardcoded (wormhole?) values (1<<20), 12, etc come from some place that is less...hidden... one day?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it should come from the device l1_size_per_core I think. But we should be sure this doesn't include the fast dispatch reserved space.

}

void wait(Event event) {
Events events = event.as<Events>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something here, but is this intentional (converting from what looks like singular Event, to vector of metal Events)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, notice above that line 156 creates a vector of metal events. This is because there might be many underlying events created and tracked from this program submit (i.e. executing on multi-device / multi-command queue) that all need to be wrapped up in a single event. To the FE runtime it should only care about managing barriers at program submit granularity.

::tt::target::metal::EventQueryCommand const *command) {
auto event = events.at(command->ref()->global_id());
(void)::tt::tt_metal::EventQuery(
event); // todo, we need flatbuffer support for tracking and doing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite understand how this all works yet, but in my head having hard time visualizing how EventQuery (returns bool in metal, true if event is completed, false if not completed) can be used here or what makes sense... does the bool value need to be propogated up, and does that even make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it makes no sense, in order to support this we need to augment flatbuffer:

table BoolRef {
  global_id: uint32;
}

table EventQueryCommand {
  ref: EventRef;
  out: BoolRef;
}

And then this code would change to be:

boolVars[command->out()->global_id()] = ::tt::tt_metal::EventQuery(event);

And then somewhere later in the flatbuffer program it could reference boolVars[command->out()->global_id()] and conditionally do something. But I actually don't have a use case at the moment to leverage this.

In a separate context I could envision a frontend API:

bool eventQuery(Event event);

For which a frontend runtime might want to query if work has finished.

@nsmithtt nsmithtt merged commit 8f2bcbb into main Aug 2, 2024
6 checks passed
@nsmithtt nsmithtt deleted the nsmith/metal-runtime2 branch August 2, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants