feat: Spark Operator Blueprint update #359

alanty · 2023-10-27T22:35:58Z

What does this PR do?

This PR includes updates/bumps and changes that I've made while working with the Spark Operator Blueprint:

Update kubernetes version to 1.28 (current EKS latest)
Update providers.tf to use execfor authentication tokens
Fix typo aws-cloudwatch-metrics-valyes.yaml
Add AmazonSSMManagedInstanceCore IAM policy to Karpenter node role
Remove kubecost scrape, add karpenter config
Decrease scrape_interval for Yunikorn metrics 1m -> 15s
Set Yunikorn nodesortpolicy to binpacking
remove unused aws_eks_cluster_auth data resource
update Karpenter Provisioners to leverage local-disks in bootstrap.sh
remove hostpath mounts from NVMe examples
update tcpds benchmark, increase Yunikorn prod queue max to accommodate job.
add queue labels to pyspark examples

Motivation

Been working with the example and I've updated and fixed a few things that may help someone else using the blueprint.

More

Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

I removed the scrape config for kubecost metrics because there were duplicate metrics for things like kube-state-metrics that were breaking queries in Grafana dashboards.
I adjusted the scrape_interval for Yunikorn because 1m wasn't frequent enough for their dashboards to calculate data
Since we're using the local-disks script from the EKS AMI the pods will write to an NVMe raid by default, we don't need the hostpath mounts to leverage the fast disks.

vara-bonthu

@alanty, I appreciate your PR! I've added a few minor suggestions for your review. Additionally, could you please ensure that any necessary updates are reflected in the Website documentation? You can refer to the TPCDS Benchmark test example at this link: https://awslabs.github.io/data-on-eks/docs/blueprints/data-analytics/spark-operator-yunikorn, particularly in the example section. It would also be beneficial to provide an explanation on the utilization of NVMe SSD by the pods in this configuration.

vara-bonthu · 2023-10-31T17:30:05Z

analytics/terraform/spark-k8s-operator/examples/benchmark/tpcds-benchmark-3t.yaml

-  volumes:
-    - name: spark-local-dir-1
-      hostPath:
-        path: /local1
  driver:
-    volumeMounts:
-      - name: spark-local-dir-1
-        mountPath: /ossdata1
-        readOnly: false
-    initContainers:
-    - name: volume-permission
-      image: public.ecr.aws/y4g4v0z7/busybox
-      command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
-      volumeMounts:
-        - name: spark-local-dir-1
-          mountPath: /ossdata1
    cores: 4


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Added a comment block in the Karpenter Provisioner and Userdata in main.tf around NVMe and the volumes.

Then added a callout on each of the pods around the node selectors.

vara-bonthu · 2023-10-31T17:30:48Z

analytics/terraform/spark-k8s-operator/examples/benchmark/tpcds-benchmark-3t.yaml

+  labels:
+    app: "tpcds-benchmark"
+    applicationId: "tpcds-benchmark-3t"
+    queue: root.prod


add a comment here saying "YuniKorn Queue"

vara-bonthu · 2023-10-31T17:31:09Z

...tics/terraform/spark-k8s-operator/examples/benchmark/tpcds-benchmark-data-generation-3t.yaml

+  labels:
+    app: "tpcds-data-generation"
+    applicationId: "tpcds-data-generation-3t"
+    queue: root.prod


same as above

vara-bonthu · 2023-10-31T17:31:26Z

...tics/terraform/spark-k8s-operator/examples/benchmark/tpcds-benchmark-data-generation-3t.yaml

-  volumes:
-    - name: spark-local-dir-1
-      hostPath:
-        path: /local1
  driver:
-    volumeMounts:
-      - name: spark-local-dir-1
-        mountPath: /ossdata1
-        readOnly: false
-    initContainers:
-    - name: volume-permission
-      image: public.ecr.aws/y4g4v0z7/busybox
-      command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
-      volumeMounts:
-        - name: spark-local-dir-1
-          mountPath: /ossdata1


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

vara-bonthu · 2023-10-31T17:32:01Z

...-k8s-operator/examples/cluster-autoscaler/nvme-ephemeral-storage/nvme-ephemeral-storage.yaml

-  volumes:  # using NVMe instance storage mounted on /local1
-    - name: spark-local-dir-1
-      hostPath:
-        path: /local1
-        type: Directory
-
  driver:
-    volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
-      - name: spark-local-dir-1
-        mountPath: /data1
-        readOnly: false
-    initContainers:
-      - name: volume-permissions
-        image: public.ecr.aws/y4g4v0z7/busybox
-        command: [ 'sh', '-c', 'chown -R 185 /local1' ]
-        volumeMounts:
-          - mountPath: "/local1"
-            name: "spark-local-dir-1"


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

vara-bonthu · 2023-10-31T17:32:34Z

.../cluster-autoscaler/nvme-yunikorn-gang-scheduling/nvme-storage-yunikorn-gang-scheduling.yaml

-  volumes:  # using NVMe instance storage mounted on /local1
-    - name: spark-local-dir-1
-      hostPath:
-        path: /local1
-        type: Directory
-
  driver:
-    volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
-      - name: spark-local-dir-1
-        mountPath: /data1
-        readOnly: false
-    initContainers:
-      - name: volume-permissions
-        image: public.ecr.aws/y4g4v0z7/busybox
-        command: [ 'sh', '-c', 'chown -R 185 /local1' ]
-        volumeMounts:
-          - mountPath: "/local1"
-            name: "spark-local-dir-1"


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter/CA, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

vara-bonthu · 2023-10-31T17:32:59Z

...orm/spark-k8s-operator/examples/karpenter/nvme-ephemeral-storage/nvme-ephemeral-storage.yaml

-  volumes:  # using NVMe instance storage mounted on /mnt/k8s-disks
-    - name: spark-local-dir-1
-      hostPath:
-        path: /mnt/k8s-disks
-        type: Directory
-
  driver:
-    volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
-      - name: spark-local-dir-1
-        mountPath: /data1
-        readOnly: false
-    initContainers:
-      - name: volume-permissions
-        image: public.ecr.aws/y4g4v0z7/busybox
-        command: [ 'sh', '-c', 'chown -R 185 /mnt/k8s-disks' ]
-        volumeMounts:
-          - mountPath: "/mnt/k8s-disks"
-            name: "spark-local-dir-1"


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

vara-bonthu · 2023-10-31T17:33:17Z

.../examples/karpenter/nvme-yunikorn-gang-scheduling/nvme-storage-yunikorn-gang-scheduling.yaml

-  volumes:  # using NVMe instance storage mounted on /mnt/k8s-disks
-    - name: spark-local-dir-1
-      hostPath:
-        path: /mnt/k8s-disks
-        type: Directory

  driver:
-    volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
-      - name: spark-local-dir-1
-        mountPath: /data1
-        readOnly: false
-    initContainers:
-      - name: volume-permissions
-        image: public.ecr.aws/y4g4v0z7/busybox
-        command: [ 'sh', '-c', 'chown -R 185 /mnt/k8s-disks' ]
-        volumeMounts:
-          - mountPath: "/mnt/k8s-disks"
-            name: "spark-local-dir-1"


We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

vara-bonthu · 2023-10-31T17:34:13Z

analytics/terraform/spark-k8s-operator/helm-values/kube-prometheus-amp-enable.yaml

-      - job_name: kubecost
-        honor_labels: true
-        scrape_interval: 1m
-        scrape_timeout: 10s
-        metrics_path: /metrics
-        scheme: http
-        dns_sd_configs:
-          - names:
-              - kubecost-cost-analyzer.kubecost.svc
-            type: 'A'
-            port: 9003


Are you not using KubeCost?

Kubecost is still running but scraping these metrics was causing some problems with grafana dashboards. Queries were returning multiple metrics and breaking "group by" statements.

Kubecost should still be able to collect metrics and make determinations on cost, it just isn't scraped in the central Prometheus config.
happy to revert these changes as needed, i didn't investigate the issue very far and don't want to break other stuff.

vara-bonthu · 2023-10-31T17:35:14Z

analytics/terraform/spark-k8s-operator/helm-values/yunikorn-values.yaml

+        nodesortpolicy:
+          type: binpacking


This is very important feature: Add some details to explain this feature

added a quick call out in the values file and link to the policy docs.

I didn't find a great place on the site to add a callout, maybe we should add a section on Yunikorn and some of the benefits/config we have?

vara-bonthu

LGTM 👍🏼 Thanks for the PR @alanty 🔥

askulkarni2

@alanty LGTM! Please merge latest main so CI checks can pass.

askulkarni2 · 2023-12-14T18:51:52Z

analytics/terraform/spark-k8s-operator/helm-values/kube-prometheus-amp-enable.yaml

+      - job_name: karpenter
+        kubernetes_sd_configs:
+        - role: endpoints
+          namespaces:
+            names:
+            - karpenter
+        relabel_configs:
+        - source_labels: [__meta_kubernetes_endpoint_port_name]
+          regex: http-metrics
+          action: keep


askulkarni2 · 2023-12-14T18:52:43Z

analytics/terraform/spark-k8s-operator/variables.tf

@@ -12,7 +12,7 @@ variable "region" {

 variable "eks_cluster_version" {
  description = "EKS Cluster version"
-  default     = "1.26"
+  default     = "1.28"


alanty and others added 15 commits October 27, 2023 09:06

Update kubernetes version to 1.28

b1318a4

Update providers.tf to use exec for auth tokens

9226a8c

Fix typo aws-cloudwatch-metrics-valyes.yaml

bbdab83

Add AmazonSSMManagedInstanceCore policy to Karpenter nodes

7739b0c

Remove kubecost scrape, add karpenter config

ef76e52

Decrease scrape_interval for Yunikorn metrics

d5daaf6

Set Yunikorn nodesortpolicy to binpacking

eae9f59

update provisioners to leverage bootstrap RAID setup

8430aea

update tpcds benchmark

6b7c5e9

remove volumes and init containers for NVMe jobs

ef32dfa

add queue to pyspark examples

2b3628c

Merge branch 'awslabs:main' into spark-operator-update

50bbaf9

remove unused aws_eks_cluster_auth data

c7677b4

pre-commit cleanup

1e4a7be

Merge branch 'spark-operator-update' into spark-op-examples

af99aeb

alanty temporarily deployed to DoEKS Test October 27, 2023 22:36 — with GitHub Actions Inactive

vara-bonthu reviewed Oct 31, 2023

View reviewed changes

Comments

46ac20f

alanty temporarily deployed to DoEKS Test November 30, 2023 22:01 — with GitHub Actions Inactive

update website with details on NVMe change

1c48b14

alanty temporarily deployed to DoEKS Test December 1, 2023 16:28 — with GitHub Actions Inactive

alanty changed the title ~~Spark Operator Blueprint update~~ feat: Spark Operator Blueprint update Dec 1, 2023

vara-bonthu approved these changes Dec 4, 2023

View reviewed changes

whitespace trim

4ce70b4

alanty temporarily deployed to DoEKS Test December 14, 2023 17:12 — with GitHub Actions Inactive

askulkarni2 approved these changes Dec 14, 2023

View reviewed changes

Merge branch 'awslabs:main' into spark-op-examples

51e7912

alanty temporarily deployed to DoEKS Test December 14, 2023 19:39 — with GitHub Actions Inactive

askulkarni2 merged commit 2015f24 into awslabs:main Dec 14, 2023
50 checks passed

alanty deleted the spark-op-examples branch December 14, 2023 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Spark Operator Blueprint update #359

feat: Spark Operator Blueprint update #359

alanty commented Oct 27, 2023

vara-bonthu left a comment

vara-bonthu Oct 31, 2023

alanty Nov 30, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

vara-bonthu Oct 31, 2023

alanty Dec 1, 2023

vara-bonthu Oct 31, 2023

alanty Dec 1, 2023

vara-bonthu left a comment

askulkarni2 left a comment •

edited

Loading

askulkarni2 Dec 14, 2023

askulkarni2 Dec 14, 2023

feat: Spark Operator Blueprint update #359

feat: Spark Operator Blueprint update #359

Conversation

alanty commented Oct 27, 2023

What does this PR do?

Motivation

More

For Moderators

Additional Notes

vara-bonthu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vara-bonthu left a comment

Choose a reason for hiding this comment

askulkarni2 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

askulkarni2 left a comment •

edited

Loading