-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootstrap metal runtime #263
Conversation
76075c5
to
77ebc82
Compare
4bbf261
to
3f6b21b
Compare
Thanks, Nick for putting this together! I am taking a look at this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, mostly questions for my own benefit. The most interesting stuff was in new files runtime.cpp and command_queue.cpp, but I guess we don't support some important stuff like EnqueueReadBuffer/WriteBuffer yet.
@@ -1,4 +1,4 @@ | |||
// RUN: ttmlir-opt --ttir-layout --ttnn-open-device --convert-ttir-to-ttnn %s > %t.mlir | |||
// RUN: ttmlir-opt --ttir-implicit-device --ttir-layout --ttnn-open-device --convert-ttir-to-ttnn %s > %t.mlir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what changed to require this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This binds a default ttir device to the enclosing module. The translate command below relies on a device being bound during flatbuffer serialization.
@@ -175,6 +177,8 @@ Event submit(Device deviceHandle, Binary executableHandle, | |||
return Event(nullptr); | |||
} | |||
|
|||
void wait(Event) { throw std::runtime_error("Not implemented"); } | |||
void wait(Event) { | |||
// Not implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to get rid of the runtime error that was here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because otherwise a FE runtime cannot generically use this interface. It's perfectly legal for a backend runtime to implement submit
as blocking and wait
is a nop. Which is how ttnn runtime works today, but we should fix this and use real ttnn events and nonblocking APIs
} | ||
|
||
root_type TTMetalBinary; | ||
file_identifier "TTB0"; | ||
file_extension "ttb"; | ||
file_identifier "TTM0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the 0 suffix here? Don't see it on TTNN/binary.fbs sibling file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has to be a 4 character code, I figured it might be useful one day to put the major version here or something? No good reason, it's arbitrary.
@@ -3,39 +3,64 @@ include "program.fbs"; | |||
namespace tt.target.metal; | |||
|
|||
|
|||
table DispatchCommand { | |||
table EnqueueProgramCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get far enough to see how these are used or understand if this is initial/bringup limitation, but all these new commands seem rather...bare... (?)... it makes me wonder if they need to more closely match host_api functions in terms of number of arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They appear somewhat bare, but most of them hold some sort of Desc
or Ref
, with an underlying Desc
that fans out to quite a bit of metadata.
@@ -29,6 +31,28 @@ inline ::tt::target::OOBVal toFlatbuffer(FlatbufferObjectCache &, | |||
} | |||
} | |||
|
|||
inline std::uint64_t getElementSizeBytes(DataType dtype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, looks like we have same function in runtime/utils.h called dataTypeElementSize(), perhaps they should be commonized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear your point, and I might be swayed one way or the other, but currently isn't any shared code between compiler and runtime except for FBS. It is somewhat deliberate in that it might be good to leave the door open for one day where the flatbuffer runtime grows out of scope of just tt-mlir and becomes its own repo.
@@ -102,6 +104,74 @@ std::vector<TensorDesc> getProgramOutputs(Flatbuffer binary, | |||
|
|||
} // namespace ttnn | |||
|
|||
namespace metal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this stuff (from eyeball check) is quite common with ttnn versions above? Do we keep them separate because they will soon diverge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They look the same, but the flatbuffer generated functions they call are different and flatbuffer controls the binary layout so they really cannot be mixed. This can be a subtle gotcha.
return ::tt::target::metal::GetSizePrefixedTTMetalBinary(binary.handle.get()); | ||
} | ||
|
||
std::pair<SystemDesc, DeviceIds> getCurrentSystemDesc() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, should we put TODO/comment with words for temporary hacks/missing functionality? Not seeing much of that in our tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but they should link to an issue. getCurrentSystemDesc
I think is being cleaned up by Jackson
::tt::target::Dim2d deviceGrid = toFlatbuffer(device->logical_grid_size()); | ||
std::vector<::flatbuffers::Offset<tt::target::ChipDesc>> chipDescs = { | ||
::tt::target::CreateChipDesc( | ||
fbb, toFlatbuffer(device->arch()), &deviceGrid, (1 << 20), 12, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like ttnn is the same, but will these hardcoded (wormhole?) values (1<<20), 12, etc come from some place that is less...hidden... one day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it should come from the device l1_size_per_core
I think. But we should be sure this doesn't include the fast dispatch reserved space.
} | ||
|
||
void wait(Event event) { | ||
Events events = event.as<Events>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something here, but is this intentional (converting from what looks like singular Event, to vector of metal Events)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, notice above that line 156 creates a vector of metal events. This is because there might be many underlying events created and tracked from this program submit
(i.e. executing on multi-device / multi-command queue) that all need to be wrapped up in a single event. To the FE runtime it should only care about managing barriers at program submit granularity.
::tt::target::metal::EventQueryCommand const *command) { | ||
auto event = events.at(command->ref()->global_id()); | ||
(void)::tt::tt_metal::EventQuery( | ||
event); // todo, we need flatbuffer support for tracking and doing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't quite understand how this all works yet, but in my head having hard time visualizing how EventQuery (returns bool in metal, true if event is completed, false if not completed) can be used here or what makes sense... does the bool value need to be propogated up, and does that even make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment it makes no sense, in order to support this we need to augment flatbuffer:
table BoolRef {
global_id: uint32;
}
table EventQueryCommand {
ref: EventRef;
out: BoolRef;
}
And then this code would change to be:
boolVars[command->out()->global_id()] = ::tt::tt_metal::EventQuery(event);
And then somewhere later in the flatbuffer program it could reference boolVars[command->out()->global_id()]
and conditionally do something. But I actually don't have a use case at the moment to leverage this.
In a separate context I could envision a frontend API:
bool eventQuery(Event event);
For which a frontend runtime might want to query if work has finished.
No description provided.