-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#104: Add ttnn flatbuffer tests into CI #274
Conversation
f83e5ab
to
9e9235d
Compare
c37568c
to
e0ae5fd
Compare
adfe5bf
to
082575c
Compare
run: | | ||
source env/activate | ||
ttrt query --save-artifacts | ||
export SYSTEM_DESC_PATH=${{ github.workspace }}/ttrt-artifacts/system_desc.ttsys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on how to scrape the system descs from the current runner pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want all the possible system descs? If so, it can be a simple ansible script that goes in to every machine provided in an inventory file and queries it. there's a GitHub api to get list of ci runners registered in mlir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a good question, ideally we want all of them, but then use some policy for divvying them up amongst the tests, i.e. we want to distribute the work evenly between the pool.
I think it might make sense to save the system desc's with their product name or tag them in some way, n300.ttsys
/n150.ttsys
/tgg.ttsys
, but we probably need to further specialize the name because of row harvesting. It gets even more complicated if we have llm box, this is where system desc slicing could be used so we could partition the single host into 8 devices and go parallel.
Let's think through how to capture this information thoroughly because it's going to need to be rock solid to avoid CI device management hell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another challenge, is that the system desc must be collected with the current build. Maybe this works like:
- Build on GitHub runner, produces build artifact
- Run a super fast job on silicon runners that collect all the system descs into an artifact
- Go back to GitHub runner which runs compiler tests with collected system descs
- Go back to silicon runner which runs the flatbuffers
Not sure if that's too crazy going back and forth. I'm just not sure how else it could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @vmilosevic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, Im not sure either how we can get the system desc without building first. Although, I was thinking we could do this
- run on GitHub runner, build ttrt
- transfer all test files onto all types CI runners
- use system desc to generate valid test cases for that machine
- run all valid tests on each machine
we could probably use githubs load balancing mechanism instead of us trying to distribute the tests across CI runners, which I think might cause more overhead. I think this will be a larger discussion offline - Lemme know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So 2 things that I think we should workout:
- Ideally llvm-lit tests are run on the compute only GitHub runners, but maybe it's not a big deal to just run this on the silicon runners. But these need to run in order to generate flatbuffers.
- Let's say we have a pool of 10 n300s, and we have 100 silicon tests. We don't want to run all 100 tests on all n300s, we want to group the tests into 10 chunks and have each runner run only 10 tests.
I think we should establish some terminology (feel free to suggest other names if these seem confusing or don't make sense, but I think we need some terms to refer to because our test matrix is quite complicated):
- Silicon Runner: 1to1 mapping to a unique SystemDesc. Because of row harvesting even different n300s might have unique system descs.
- Target Family: A category of system type, we should organize these by product name, like n150, n300, TG, TGG, LLM Box etc. We will have a pool of n300's for example (all the same target system), but might each have unique system descs.
- Target Capabilities: high level attributes that describe testable traits of a Target Family.
- Test Capabilities: A set of Target Capabilities defined in the test. Effectively describes which system it should be compiled and run against.
To run through an example of how this could work, tracing through the perspective from a single test:
Please reference llvm-lit directives REQUIRES and UNSUPPORTED
- GitHub CI builds tt-mlir and ttrt
- All Silicon Runners picks up the build artifact and generates all system descs for all runners in our pool. We need to organize all of these somehow and bucketize them by Target Family. Each Target Family bucket forms a pool of available test targets for Silicon tests to be compiled against. Each Target Family bucket has an associated set of Target Capabilities which can be driven by the data in system desc itself (more on this later)
- The system desc artifact (a zip file containing all system descs in the entire pool) comes back to the GitHub compute runner. Here we do some kind of partitioning of the Silicon Runners within each Target Family category. So for example, n300 is a Target Family, there might be 10 Silicon Runners of this type, ideally all of the tests that opt into being compiled against this Target Family get evenly distributed amongst the 10 available runners.
- Zooming in on a particular test, it might have llvm-lit directive
REQUIRES: Galaxy HugePage
. HereGalaxy
andHugePage
forms this test's Test Capabilities (just a set of 1). TheGalaxy
tag itself is a Target Capability that is associated with TG and TGG because both Target Family's export this Target Capability. This would opt the test into being compiled against a single system desc that is part of this Target Family pool.
A note on Target Capabilities, I think some of these tags should be opt in and some opt out. For example, most silicon tests, without any Test Capabilities listed should implicitly opt into running on most silicon runners, like by default everything should run on n300's. Whereas multi-device tests should explicitly opt in (via REQUIRES
directive).
7f161b6
to
05afa64
Compare
05afa64
to
f828963
Compare
@vmilosevic, has taken this work over. Closing, we can reopen if it makes sense. |
No description provided.