-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable docker image feature with xpk #28
Conversation
dataset_name=metric_config.DatasetOption.XLML_DATASET, | ||
) | ||
|
||
run_model_cmds = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this test is actually going to be used, should it be in the example
directory? If it is just an example, I would go even simpler and just print jax.device_count()
to show the configuration is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Yes, currently I just want to show an example. Since we are testing models, I was thinking probably it's better to have a hello world E2E example. Do you feel this is a complex one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine if the script prints the topology or number of devices. IMO the main part we want to confirm with the example is that we created the right machine and any metadata needed for initialization is plumbed through correctly to the runtime.
* Add unit tests to checks * Update test for auth
* Add auto push after commit merge * Address comment
* Build JSonnet files as part of the postsubmit Change-Id: I811493a530d8eed91c975aebe15d714903283678 * remove extra quotes Change-Id: Iba35dacab442d783304f738a2a42bb0bd350fb63
* Pull the latest submodule commit and add configs Change-Id: Ia43089a8224e7f258eb1a9b44876ebd8f0c2b20f * Apply config changes by regenerating Change-Id: I9418e17389e421695d4511a4979f28f318c224df * Remove Jsonnet configs and update test name Change-Id: I4ff436944ed3b5986f7b50f8caa24c673ec15d3b * Remove configs Change-Id: I774bbe8d2d03835bef0cbaededff1f5575a45e83 * Add pjrt in the test name Change-Id: I4847ac74d883def7b51363f5a60f0f99e9665b67
Change-Id: I58ee7d2e9237b686004e0a7454d54cf63f2e5e70
…form#37) Change-Id: I58947fff8617ce27740b566d2aef90bbeda21152
…rm#38) Change-Id: I575c2afcdbba6883ccc1569bd38d0a60e6f24f74
Change-Id: I39d23e223b6c0c2c2633b4cee3a3d65909dbf63f
New PR.
Description
Enable docker image feature with xpk:
TpuGkeTask
to execute run_model and post_process task groupsconfigs/cluster
folder, so Airflow has the access to the created cluster with TPUs. It seemscertificate-authority-data
is safe to share based on this stackoverflow thread.TpuTask
toTpuGceTask
to distinguishTpuGkeTask
, and similar for test configA few TODO once b/311073979 is approved:
Tests
Please describe the tests that you ran on Cloud VM to verify changes.
Instruction and/or command lines to reproduce your tests: ...
List links for your tests (use go/shortn-gen for any internal link): ...
Checklist
Before submitting this PR, please make sure (put X in square brackets):