Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when calling CreateDevice in a loop on TG #16260

Merged
merged 1 commit into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion tt_metal/impl/device/device_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -206,20 +206,31 @@ void DevicePool::initialize(

// Never skip for TG Cluster
bool skip = not tt::Cluster::instance().is_galaxy_cluster();
std::vector<chip_id_t> target_mmio_ids;
Copy link
Contributor

@tt-aho tt-aho Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reserve by device_ids.size() ahead of time? May slightly over-allocate but this vector is discarded after right?
Edit: Reserving by tt::Cluster::instance().number_of_pci_devices() might be better?

for (const auto& device_id : device_ids) {
TT_FATAL(
device_id < tt::Cluster::instance().number_of_devices(),
"Device index {} out of range. There are {} devices available.",
device_id,
tt::Cluster::instance().number_of_devices());
const auto& mmio_device_id = tt::Cluster::instance().get_associated_mmio_device(device_id);
if (std::find(target_mmio_ids.begin(), target_mmio_ids.end(), mmio_device_id) == target_mmio_ids.end()) {
target_mmio_ids.push_back(mmio_device_id);
}
skip &= (device_id == mmio_device_id);
}
if (target_mmio_ids.size() != tt::Cluster::instance().number_of_pci_devices()) {
log_warning(
tt::LogMetal,
"Opening subset of mmio devices slows down UMD read/write to remote chips. If opening more devices, "
"consider using CreateDevices API.");
}

_inst->skip_remote_devices = skip;

_inst->add_devices_to_pool(device_ids);
_inst->init_firmware_on_active_devices();
tt::Cluster::instance().set_internal_routing_info_for_ethernet_cores(true);
tt::Cluster::instance().set_internal_routing_info_for_ethernet_cores(true, target_mmio_ids);
_inst->init_profiler_devices();
}

Expand Down
15 changes: 9 additions & 6 deletions tt_metal/llrt/tt_cluster.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -973,18 +973,21 @@ std::tuple<tt_cxy_pair, tt_cxy_pair> Cluster::get_eth_tunnel_core(
}

// TODO: ALLAN Can change to write one bit
void Cluster::set_internal_routing_info_for_ethernet_cores(bool enable_internal_routing) const {
void Cluster::set_internal_routing_info_for_ethernet_cores(bool enable_internal_routing, const std::vector<chip_id_t> &target_mmio_devices) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we call this function in a number of different spots for both init and close. Do we need to survey whether those callsites need update?

Is it simpler to just read back the enable bit before writing ?

log_debug(tt::LogDevice, "Set internal routing bit {}", enable_internal_routing);
const uint32_t routing_info_addr = eth_l1_mem::address_map::ERISC_APP_ROUTING_INFO_BASE;
// TODO: initialize devices if user does not
// Must initialize remote chips first, then mmio chips since once mmio chips are doing fd routing
// we do not always context switch to base FW
std::vector<chip_id_t> mmio_devices;
mmio_devices.reserve(this->devices_grouped_by_assoc_mmio_device_.size());
std::vector<chip_id_t> non_mmio_devices;
for (const auto &[assoc_mmio_device, devices] : this->devices_grouped_by_assoc_mmio_device_) {
mmio_devices.emplace_back(assoc_mmio_device);
for (const auto &chip_id : devices) {
std::vector<chip_id_t> mmio_devices = target_mmio_devices;
if (mmio_devices.size() == 0) {
for (const auto &[assoc_mmio_device, devices] : this->devices_grouped_by_assoc_mmio_device_) {
mmio_devices.emplace_back(assoc_mmio_device);
}
}
for (const auto &mmio_chip_id : mmio_devices) {
for (const auto &chip_id : this->devices_grouped_by_assoc_mmio_device_.at(mmio_chip_id)) {
Comment on lines +983 to +990
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reserve sizes for non_mmio_devices vector and also mmio_devices vector if target_mmio_devices is empty ahead of time?

non_mmio_devices.emplace_back(chip_id);
}
}
Expand Down
2 changes: 1 addition & 1 deletion tt_metal/llrt/tt_cluster.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ class Cluster {
// set_internal_routing_info_for_ethernet_cores(false);
// CloseDevice(0)
// CloseDevice(1)
void set_internal_routing_info_for_ethernet_cores(bool enable_internal_routing) const;
void set_internal_routing_info_for_ethernet_cores(bool enable_internal_routing, const std::vector<chip_id_t>& target_mmio_devices = {}) const;

// Returns MMIO device ID (logical) that controls given `device_id`. If `device_id` is MMIO device it is returned.
chip_id_t get_associated_mmio_device(chip_id_t device_id) const {
Expand Down
Loading