Skip to content

Commit

Permalink
#15290: Fix UAF issue with Singleton destruction order (#15689)
Browse files Browse the repository at this point in the history
### Ticket
[Link to Github
Issue](#15290)

### Problem description
There are a number of Singletons being used in tt-metal: `SystemMesh`,
`DevicePool`, `Cluster`. There's a corner case that popped up that
caused a UAF error where `SystemMesh` was already destructed before
MeshDevice.

### What's changed
- Decouple MeshDevice destruction from SystemMesh destruction.
- Refactor SystemMesh to encapsulate internals in pImpl class.
- Added a new test case for SystemMesh tear down with static mesh in the
distributed tests.

### Checklist
- [x] Post commit CI passes
- [ ] Blackhole Post commit (if applicable)
- [x] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [x] New/Existing tests provide coverage for changes

https://github.com/tenstorrent/tt-metal/actions/runs/12198356189
https://github.com/tenstorrent/tt-metal/actions/runs/12194392381
  • Loading branch information
cfjchu authored Dec 6, 2024
1 parent 12b822c commit 0e5fb14
Show file tree
Hide file tree
Showing 4 changed files with 232 additions and 206 deletions.
6 changes: 5 additions & 1 deletion tests/ttnn/distributed/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
add_executable(test_distributed test_distributed.cpp)
add_executable(
test_distributed
test_distributed.cpp
test_distributed_atexit.cpp
)

# Set up properties for the target
setup_ttnn_test_target(test_distributed)
Expand Down
26 changes: 26 additions & 0 deletions tests/ttnn/distributed/test_distributed_atexit.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
// SPDX-FileCopyrightText: © 2024 Tenstorrent Inc.
//
// SPDX-License-Identifier: Apache-2.0

#include <gtest/gtest.h>

#include <cstddef>
#include <ttnn/core.hpp>
#include <ttnn/distributed/api.hpp>
#include "distributed/mesh_device.hpp"

namespace ttnn::distributed::test {

// Simplified test without fixture, and mesh variable moved inside test
TEST(DistributedTestStandalone, TestSystemMeshTearDownWithoutClose) {
static std::shared_ptr<ttnn::MeshDevice> mesh;
auto& sys = tt::tt_metal::distributed::SystemMesh::instance();
mesh = ttnn::distributed::open_mesh_device(
{2, 4}, DEFAULT_L1_SMALL_SIZE, DEFAULT_TRACE_REGION_SIZE, 1, tt::tt_metal::DispatchCoreType::WORKER);

auto [rows, cols] = sys.get_shape();
EXPECT_GT(rows, 0);
EXPECT_GT(cols, 0);
}

} // namespace ttnn::distributed::test
Loading

0 comments on commit 0e5fb14

Please sign in to comment.