Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the NativeLink Operator #1088

Merged
merged 1 commit into from
Oct 30, 2024
Merged

Conversation

aaronmondal
Copy link
Member

@aaronmondal aaronmondal commented Jul 6, 2024

A single kubectl apply -k now deploys NativeLink in a self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts, but now happens fully automated in the cluster and no longer requires a local Nix installation as all tag evaluations have become implementation details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice" Kustomize directory layout. This further reduces code duplication and gives third parties greater flexibility and more useful reference points to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.


This change is Reviewable

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@adam-singer +@allada +@blakehatch

cc @MarcusSorealheis @bclark8923 @kubevalet

New docpages at:

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable (waiting on @adam-singer, @allada, and @blakehatch)

@aaronmondal aaronmondal force-pushed the flux branch 5 times, most recently from 11c1359 to cfad74d Compare July 7, 2024 04:39
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

The docs look good, though. Nice.

Reviewed 3 of 61 files at r1.
Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

why isn't this pointing to the TraceMachina repo?


deploy/dev/kustomization.yaml line 26 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again


deploy/kubernetes-example/kustomization.yaml line 18 at r1 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

again

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is massive and there is no simple opportunity for breaking it up. just annoying.

After looking into reducing the size of this PR I think there are some parts that I can break out. I'll send PRs for those parts which hopefully also makes it a bit clearer why/how I'm making these changes.

What i don't like is that the Chromium example received a Pulumi dependency. Is that required?

We always had the pulumi dependency for the Chromium example. It's just more apparent now. However, this PR paves the way to reduce that dependency.

The way the examples generally work is:

  1. Start a K8s cluster and prepare some dependencies in it. This is done via Pulumi.
  2. Build or fetch NativeLink container images and toolchains. This was previously done via the 01_operations shell scripts. Now it happens automatically inside the cluster. This fixes Tag evaluation for K8s images shouldn't run on the host #1012 which is a blocker for MacOS.
  3. Deploy the actual NativeLink deployments. This was previously done via the 02_application scripts and now also happens inside the cluster, so now it's no longer necessary to invoke any shell scripts manually.

What the new deploy and kubernetes directories do is essentially create "building blocks" for creating NativeLink K8s deployments. For instance, if you had an existing Helm chart (wink) you could now use these building blocks to deploy that chart as well. This also turns non-production parts of the examples, like the insecure example certs into Components. This way they're more easily swappable with e.g. "real" CAs. Same for the HttpRoutes which would require functional Gateway API Gateways in the cluster. Now it's possible to omit those routes and configure your own ingress logic instead.

The next step here is to also migrate the Tekton Pipelines out of the native-cli and into the kubernetes directory so that they're deployed via Flux instead of Pulumi. After that we've clearly separated concerns between Pulumi and K8s and the kubectl apply -k https://github.com/TraceMachina/nativelink//deploy/<somestack> should be self-contained enough that users can start running it against arbitrary K8s clusters.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

why isn't this pointing to the TraceMachina repo?

This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.

I just noticed that it's also possible to create a dedicated CI overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.

Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux branch override).

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I broke out various parts of this PR in:

I'll rebase this PR after these have been merged.

Reviewable status: 0 of 3 LGTMs obtained, and 3 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

Copy link
Contributor

@adam-singer adam-singer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronmondal happy to sync up offline and test out these changes locally to get a better sense what this means for using these locally

Reviewed 42 of 61 files at r1, 14 of 14 files at r2, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @allada and @blakehatch)


docs/src/content/docs/guides/chromium.mdx line 25 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: leaving this here so we don't for get it


docs/src/content/docs/guides/kubernetes.mdx line 26 at r2 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.

nit: before landing

@aaronmondal aaronmondal force-pushed the flux branch 2 times, most recently from 2c7b100 to 4bc267a Compare July 9, 2024 03:20
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 3 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)

@aaronmondal aaronmondal force-pushed the flux branch 4 times, most recently from f63cf71 to 75c943a Compare July 9, 2024 10:42
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this error when I try to run native up with Docker running on Ubuntu 24:

Diagnostics:
  docker:index:Container (kind-registry):
    error: Docker native provider returned an unexpected error from Configure: failed to connect to any docker daemon

Resources:
    2 unchanged

Duration: 1s

2024/08/20 17:44:51 component error: pulumi error: decoding YAML: rpc error: code = Canceled desc = context canceled

@MarcusSorealheis
Copy link
Collaborator

For clarity on my last comment:

  1. this is on your branch based on the guide
  2. all my dependencies are up to date.

@aaronmondal
Copy link
Member Author

Ready to merge, modulo the references to my repo after reviews

Copy link
Contributor

@SchahinRohani SchahinRohani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 3 files at r5, 10 of 17 files at r6, 1 of 1 files at r8, all commit messages.
Reviewable status: 0 of 3 LGTMs obtained, and 55 of 67 files reviewed, and pending CI: Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, and 11 discussions need to be resolved (waiting on @adam-singer, @allada, and @blakehatch)


README.md line 62 at r8 (raw file):

The setups below are **production-grade** installations. See the [contribution docs](https://nativelink.com/docs/contribute/nix/) for instructions on how to build from source with [Bazel](https://nativelink.com/docs/contribute/bazel/), [Cargo](https://nativelink.com/docs/contribute/cargo/), and [Nix](https://nativelink.com/docs/contribute/nix/).

You can find a few example deployments in the [Docs](https://docs.nativelink.com/guides/kubernetes).

The docs path should be relative "/docs/guides/kubernetes", if we want to have it self contained.


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

This is so that the LRE-remote job https://github.com/TraceMachina/nativelink/actions/runs/9824654274/job/27123866843?pr=1088 passes. Since the deployments in this PR arent on main yet we'll have to bring them in before we can change this. However, we can set them to the TraceMachina repo and disable the LRE job in CI and then immediately reenable it after these changes have landed.

I just noticed that it's also possible to create a dedicated CI overlay that sets this to the actual PR branch dynamically. This is probably what we want so I'll look into this more.

Keeping this comment (and the other two) open so that I don't forget to change these overrides. (and the flux branch override).

nit: Don't forget to change the link


deploy/kubernetes-example/kustomization.yaml line 18 at r8 (raw file):

    - op: replace
      path: /spec/url
      value: https://github.com/aaronmondal/nativelink

nit: point to nativelink repository


deploy/kubernetes-example/kustomization.yaml line 32 at r8 (raw file):

    - op: replace
      path: /spec/eventMetadata/flakeOutput
      value: github:aaronmondal/nativelink/flux#nativelink-worker-lre-cc

nit: point to nativelink repository


deploy/kubernetes-example/kustomization.yaml line 46 at r8 (raw file):

    - op: replace
      path: /spec/eventMetadata/flakeOutput
      value: github:aaronmondal/nativelink/flux#nativelink-worker-init

nit: point to nativelink repository


web/platform/src/content/docs/docs/deployment-examples/chromium.mdx line 26 at r8 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.
git clone https://github.com/aaronmondal/nativelink && \

nit: point to nativelink repository


web/platform/src/content/docs/docs/deployment-examples/chromium.mdx line 51 at r8 (raw file):

```bash
kubectl apply -k \
    https://github.com/aaronmondal/nativelink//deploy/chromium-example?ref=flux

nit: point to nativelink repository


web/platform/src/content/docs/docs/deployment-examples/kubernetes.mdx line 27 at r8 (raw file):

```bash
# TODO(aaronmondal): Point to the main repo before merging.
git clone https://github.com/aaronmondal/nativelink && \

nit: point to nativelink repository


web/platform/src/content/docs/docs/deployment-examples/kubernetes.mdx line 52 at r8 (raw file):

```bash
kubectl apply -k \
    https://github.com/aaronmondal/nativelink//deploy/kubernetes-example?ref=flux

nit: point to nativelink repository

@SchahinRohani SchahinRohani changed the title Introduce the NativeLink Kubernetes operator Introduce the NativeLink Operator Oct 29, 2024
Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-@adam-singer -@blakehatch -@allada

Reviewable status: 2 of 1 LGTMs obtained, and 55 of 67 files reviewed, and 2 discussions need to be resolved


README.md line 62 at r8 (raw file):

Previously, SchahinRohani (Schahin) wrote…

The docs path should be relative "/docs/guides/kubernetes", if we want to have it self contained.

Done.


deploy/chromium-example/kustomization.yaml line 18 at r1 (raw file):

Previously, SchahinRohani (Schahin) wrote…

nit: Don't forget to change the link

Done.

A single `kubectl apply -k` now deploys NativeLink in a
self-configuring, self-healing and self-updating fashion.

To achieve this we implement a two-stage depoyment to asynchronously
reconciliate various parts of NativeLink Kustomizations.

First, we deploy Flux Alerts that trigger Tekton Pipelines on
GitRepository updates to bring required images into the cluster.

Second, and technically at the same time, we start a Flux Kustomization
to deploy a NativeLink Kustomization.

This is similar to the previous 01_operations and 02_applicaion scripts,
but now happens fully automated in the cluster and no longer requires a
local Nix installation as all tag evaluations have become implementation
details of the Tekton Pipelines.

This commit also changes the K8s resource layout to a "best-practice"
Kustomize directory layout. This further reduces code duplication and
gives third parties greater flexibility and more useful reference points
to build custom NativeLink setups.

Includes an overhaul of the Kubernetes documentation.
Copy link
Member Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 31 of 61 files at r1, 10 of 14 files at r2, 6 of 6 files at r3, 13 of 17 files at r6, 1 of 1 files at r8, 6 of 6 files at r9, all commit messages.
Dismissed @MarcusSorealheis and @SchahinRohani from 2 discussions.
Reviewable status: 2 of 1 LGTMs obtained, and all files reviewed, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, Web Platform Deployment / macos-14, Web Platform Deployment / ubuntu-24.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, vale, windows-2022 / stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants