-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional fixes to metal runtime bootstrap #290 #291
Conversation
nsmithtt
commented
Aug 4, 2024
- Make inputs per-device (multi-device inputs still need to be worked out)
- Create common functions for creating metal Buffers and CoreRange from flatbuffer types
- Implement wait API + events, update ttrt to use it
- Bug fix logical_grid_size -> compute_with_storage_grid_size, the latter accurately gives post-harvested shape
- Automatically move inputs / outputs during submit
d48c38e
to
81b81b9
Compare
- Make inputs per-device (multi-device inputs still need to be worked out) - Create common functions for creating metal Buffers and CoreRange from flatbuffer types - Implement wait API + events, update ttrt to use it - Bug fix logical_grid_size -> compute_with_storage_grid_size, the latter accurately gives post-harvested shape - Automatically move inputs / outputs during submit
81b81b9
to
f2fddea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! One request in general (here and elsewhere) is it would help if we could have comments per function.
buffer->set_address(command->ref()->address()); | ||
buffers[command->ref()->global_id()] = buffer; | ||
buffers[command->ref()->global_id()] = | ||
createBufferFromTensorRef(device, command->ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome.
executeCommandQueue(device, cq, cq_id, inputs, outputs)); | ||
++cq_id; | ||
} | ||
|
||
Events copyEvents = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double check, idea here is that if device outputs need to be copied to host, we will ignore earlier events generated from the execution of programs, by just returning the newer events that represent the copying (reads) of outputs from device to host... seems to make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though we don't ignore the earlier events, they are waited on inside of maybeCopyHostOutputs
.
ttrt code lgtm! |
b91c0a0
to
13f40c5
Compare