-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOMMU support episode II #393
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Are there going to be further PRs, or will sysmem just work now on tt_metal on a blackhole machine without hugepages?
This should just work provided the IOMMU is enabled. It will fail if IOMMU is disabled or in passthrough mode and there are no hugepages available. I suppose a caveat here is this hasn't been extensively tested. |
Reverts commit 3210bd9 from #393. ### Issue Metal CI failure tracked in tenstorrent/tt-metal#15675 ### Description The reverted commit removed logic that allowed applications to request more hugepages than available to UMD. Previously, UMD would issue a warning in such cases. However, this created a potential safety issue since applications had no visibility into partial hugepage allocation (e.g., requesting 4 pages but receiving only 2). This situation could lead to: - Host software segfaults when accessing unmapped pages - More critically, device software could potentially corrupt host physical address space by writing to nonexistent pages While the original change (making excessive hugepage requests a fatal error) improved safety, particularly in conjunction with IOMMU enablement, it caused failures in Metal CI tests that (possibly unintentionally?) request more hugepages than available. This revert is a temporary measure until the Metal CI tests can be updated. ### List of the changes * Revert 3210bd9 * Update comment documentation ### Testing CI ### API Changes There are no API changes in this PR.
Issue
#370
Description
Adds IOMMU support for Blackhole in a way that should be transparent to the application.
List of the changes
Testing
Manual testing was performed for both IOMMU on and IOMMU off cases using the newly-added sysmem tests for Blackhole.
With IOMMU on:
With IOMMU in passthrough:
API Changes
There are no API changes in this PR.