-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce the NativeLink Operator
#1088
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@adam-singer +@allada +@blakehatch
cc @MarcusSorealheis @bclark8923 @kubevalet
New docpages at:
- https://df0124ed.nativelink.pages.dev/guides/kubernetes/
- https://df0124ed.nativelink.pages.dev/guides/chromium/
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer, @allada, and @blakehatch)
11c1359
to
cfad74d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is massive and there is no simple opportunity for breaking it up. just annoying.
The docs look good, though. Nice.
Reviewed 3 of 61 files at r1.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
why isn't this pointing to the TraceMachina repo?
deploy/dev/kustomization.yaml
line 26 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
again
deploy/kubernetes-example/kustomization.yaml
line 18 at r1 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i don't like is that the Chromium example received a Pulumi dependency. Is that required?
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is massive and there is no simple opportunity for breaking it up. just annoying.
After looking into reducing the size of this PR I think there are some parts that I can break out. I'll send PRs for those parts which hopefully also makes it a bit clearer why/how I'm making these changes.
What i don't like is that the Chromium example received a Pulumi dependency. Is that required?
We always had the pulumi dependency for the Chromium example. It's just more apparent now. However, this PR paves the way to reduce that dependency.
The way the examples generally work is:
- Start a K8s cluster and prepare some dependencies in it. This is done via Pulumi.
- Build or fetch NativeLink container images and toolchains. This was previously done via the
01_operations
shell scripts. Now it happens automatically inside the cluster. This fixes Tag evaluation for K8s images shouldn't run on the host #1012 which is a blocker for MacOS. - Deploy the actual NativeLink deployments. This was previously done via the
02_application
scripts and now also happens inside the cluster, so now it's no longer necessary to invoke any shell scripts manually.
What the new deploy
and kubernetes
directories do is essentially create "building blocks" for creating NativeLink K8s deployments. For instance, if you had an existing Helm chart (wink) you could now use these building blocks to deploy that chart as well. This also turns non-production parts of the examples, like the insecure example certs into Components. This way they're more easily swappable with e.g. "real" CAs. Same for the HttpRoutes which would require functional Gateway API Gateways in the cluster. Now it's possible to omit those routes and configure your own ingress logic instead.
The next step here is to also migrate the Tekton Pipelines out of the native-cli
and into the kubernetes
directory so that they're deployed via Flux instead of Pulumi. After that we've clearly separated concerns between Pulumi and K8s and the kubectl apply -k https://github.com/TraceMachina/nativelink//deploy/<somestack>
should be self-contained enough that users can start running it against arbitrary K8s clusters.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
Previously, MarcusSorealheis (Marcus Eagan) wrote…
why isn't this pointing to the TraceMachina repo?
This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.
I just noticed that it's also possible to create a dedicated CI
overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.
Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux
branch override).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I broke out various parts of this PR in:
- Allow Tekton pipelines to be triggered by Flux Alerts #1094
- Update Go dependencies #1095
- Add Flux to development cluster #1096
- Allow WebSocket upgrades in devcluster Loadbalancer #1098
- Write Tekton image tag outputs to a ConfigMap #1100
I'll rebase this PR after these have been merged.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronmondal happy to sync up offline and test out these changes locally to get a better sense what this means for using these locally
Reviewed 42 of 61 files at r1, 14 of 14 files at r2, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada and @blakehatch)
docs/src/content/docs/guides/chromium.mdx
line 25 at r2 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging.
nit: leaving this here so we don't for get it
docs/src/content/docs/guides/kubernetes.mdx
line 26 at r2 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging.
nit: before landing
2c7b100
to
4bc267a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
f63cf71
to
75c943a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this error when I try to run native up with Docker running on Ubuntu 24:
Diagnostics:
docker:index:Container (kind-registry):
error: Docker native provider returned an unexpected error from Configure: failed to connect to any docker daemon
Resources:
2 unchanged
Duration: 1s
2024/08/20 17:44:51 component error: pulumi error: decoding YAML: rpc error: code = Canceled desc = context canceled
For clarity on my last comment:
|
Ready to merge, modulo the references to my repo after reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r5, 10 of 17 files at r6, 1 of 1 files at r8, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and 55 of 67 files reviewed, and pending CI: Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, and 11 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)
README.md
line 62 at r8 (raw file):
The setups below are **production-grade** installations. See the [contribution docs](https://nativelink.com/docs/contribute/nix/) for instructions on how to build from source with [Bazel](https://nativelink.com/docs/contribute/bazel/), [Cargo](https://nativelink.com/docs/contribute/cargo/), and [Nix](https://nativelink.com/docs/contribute/nix/). You can find a few example deployments in the [Docs](https://docs.nativelink.com/guides/kubernetes).
The docs path should be relative "/docs/guides/kubernetes", if we want to have it self contained.
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.
I just noticed that it's also possible to create a dedicated
CI
overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the
flux
branch override).
nit: Don't forget to change the link
deploy/kubernetes-example/kustomization.yaml
line 18 at r8 (raw file):
- op: replace path: /spec/url value: https://github.com/aaronmondal/nativelink
nit: point to nativelink repository
deploy/kubernetes-example/kustomization.yaml
line 32 at r8 (raw file):
- op: replace path: /spec/eventMetadata/flakeOutput value: github:aaronmondal/nativelink/flux#nativelink-worker-lre-cc
nit: point to nativelink repository
deploy/kubernetes-example/kustomization.yaml
line 46 at r8 (raw file):
- op: replace path: /spec/eventMetadata/flakeOutput value: github:aaronmondal/nativelink/flux#nativelink-worker-init
nit: point to nativelink repository
web/platform/src/content/docs/docs/deployment-examples/chromium.mdx
line 26 at r8 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging. git clone https://github.com/aaronmondal/nativelink && \
nit: point to nativelink repository
web/platform/src/content/docs/docs/deployment-examples/chromium.mdx
line 51 at r8 (raw file):
```bash kubectl apply -k \ https://github.com/aaronmondal/nativelink//deploy/chromium-example?ref=flux
nit: point to nativelink repository
web/platform/src/content/docs/docs/deployment-examples/kubernetes.mdx
line 27 at r8 (raw file):
```bash # TODO(aaronmondal): Point to the main repo before merging. git clone https://github.com/aaronmondal/nativelink && \
nit: point to nativelink repository
web/platform/src/content/docs/docs/deployment-examples/kubernetes.mdx
line 52 at r8 (raw file):
```bash kubectl apply -k \ https://github.com/aaronmondal/nativelink//deploy/kubernetes-example?ref=flux
nit: point to nativelink repository
NativeLink Operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-@adam-singer -@blakehatch -@allada
Reviewable status: 2 of 1 LGTMs obtained, and 55 of 67 files reviewed, and 2 discussions need to be resolved
README.md
line 62 at r8 (raw file):
Previously, SchahinRohani (Schahin) wrote…
The docs path should be relative "/docs/guides/kubernetes", if we want to have it self contained.
Done.
deploy/chromium-example/kustomization.yaml
line 18 at r1 (raw file):
Previously, SchahinRohani (Schahin) wrote…
nit: Don't forget to change the link
Done.
A single `kubectl apply -k` now deploys NativeLink in a self-configuring, self-healing and self-updating fashion. To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations. First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster. Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization. This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines. This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups. Includes an overhaul of the Kubernetes documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 31 of 61 files at r1, 10 of 14 files at r2, 6 of 6 files at r3, 13 of 17 files at r6, 1 of 1 files at r8, 6 of 6 files at r9, all commit messages.
Dismissed @MarcusSorealheis and @SchahinRohani from 2 discussions.
Reviewable status: 2 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable
A single
kubectl apply -k
now deploys NativeLink in a self-configuring, self-healing and self-updating fashion.To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations.
First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster.
Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization.
This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines.
This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups.
Includes an overhaul of the Kubernetes documentation.
This change is