Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Spark Operator Blueprint update #359

Merged
merged 19 commits into from
Dec 14, 2023

Conversation

alanty
Copy link
Contributor

@alanty alanty commented Oct 27, 2023

What does this PR do?

This PR includes updates/bumps and changes that I've made while working with the Spark Operator Blueprint:

  • Update kubernetes version to 1.28 (current EKS latest)
  • Update providers.tf to use execfor authentication tokens
  • Fix typo aws-cloudwatch-metrics-valyes.yaml
  • Add AmazonSSMManagedInstanceCore IAM policy to Karpenter node role
  • Remove kubecost scrape, add karpenter config
  • Decrease scrape_interval for Yunikorn metrics 1m -> 15s
  • Set Yunikorn nodesortpolicy to binpacking
  • remove unused aws_eks_cluster_auth data resource
  • update Karpenter Provisioners to leverage local-disks in bootstrap.sh
  • remove hostpath mounts from NVMe examples
  • update tcpds benchmark, increase Yunikorn prod queue max to accommodate job.
  • add queue labels to pyspark examples

Motivation

Been working with the example and I've updated and fixed a few things that may help someone else using the blueprint.

More

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

  • I removed the scrape config for kubecost metrics because there were duplicate metrics for things like kube-state-metrics that were breaking queries in Grafana dashboards.
  • I adjusted the scrape_interval for Yunikorn because 1m wasn't frequent enough for their dashboards to calculate data
  • Since we're using the local-disks script from the EKS AMI the pods will write to an NVMe raid by default, we don't need the hostpath mounts to leverage the fast disks.

@alanty alanty temporarily deployed to DoEKS Test October 27, 2023 22:36 — with GitHub Actions Inactive
Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanty, I appreciate your PR! I've added a few minor suggestions for your review. Additionally, could you please ensure that any necessary updates are reflected in the Website documentation? You can refer to the TPCDS Benchmark test example at this link: https://awslabs.github.io/data-on-eks/docs/blueprints/data-analytics/spark-operator-yunikorn, particularly in the example section. It would also be beneficial to provide an explanation on the utilization of NVMe SSD by the pods in this configuration.

Comment on lines -68 to 73
volumes:
- name: spark-local-dir-1
hostPath:
path: /local1
driver:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
cores: 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment block in the Karpenter Provisioner and Userdata in main.tf around NVMe and the volumes.

Then added a callout on each of the pods around the node selectors.

labels:
app: "tpcds-benchmark"
applicationId: "tpcds-benchmark-3t"
queue: root.prod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here saying "YuniKorn Queue"

labels:
app: "tpcds-data-generation"
applicationId: "tpcds-data-generation-3t"
queue: root.prod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines -67 to -82
volumes:
- name: spark-local-dir-1
hostPath:
path: /local1
driver:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Comment on lines -82 to -99
volumes: # using NVMe instance storage mounted on /local1
- name: spark-local-dir-1
hostPath:
path: /local1
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Comment on lines -82 to -99
volumes: # using NVMe instance storage mounted on /local1
- name: spark-local-dir-1
hostPath:
path: /local1
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter/CA, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Comment on lines -82 to -99
volumes: # using NVMe instance storage mounted on /mnt/k8s-disks
- name: spark-local-dir-1
hostPath:
path: /mnt/k8s-disks
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /mnt/k8s-disks' ]
volumeMounts:
- mountPath: "/mnt/k8s-disks"
name: "spark-local-dir-1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Comment on lines -82 to -99
volumes: # using NVMe instance storage mounted on /mnt/k8s-disks
- name: spark-local-dir-1
hostPath:
path: /mnt/k8s-disks
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /mnt/k8s-disks' ]
volumeMounts:
- mountPath: "/mnt/k8s-disks"
name: "spark-local-dir-1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Comment on lines -33 to -43
- job_name: kubecost
honor_labels: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
dns_sd_configs:
- names:
- kubecost-cost-analyzer.kubecost.svc
type: 'A'
port: 9003
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not using KubeCost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubecost is still running but scraping these metrics was causing some problems with grafana dashboards. Queries were returning multiple metrics and breaking "group by" statements.

Kubecost should still be able to collect metrics and make determinations on cost, it just isn't scraped in the central Prometheus config.
happy to revert these changes as needed, i didn't investigate the issue very far and don't want to break other stuff.

Comment on lines +110 to +111
nodesortpolicy:
type: binpacking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very important feature: Add some details to explain this feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a quick call out in the values file and link to the policy docs.

I didn't find a great place on the site to add a callout, maybe we should add a section on Yunikorn and some of the benefits/config we have?

@alanty alanty changed the title Spark Operator Blueprint update feat: Spark Operator Blueprint update Dec 1, 2023
Copy link
Collaborator

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼 Thanks for the PR @alanty 🔥

Copy link
Collaborator

@askulkarni2 askulkarni2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alanty LGTM! Please merge latest main so CI checks can pass.

Comment on lines +44 to +53
- job_name: karpenter
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- karpenter
relabel_configs:
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: http-metrics
action: keep
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -12,7 +12,7 @@ variable "region" {

variable "eks_cluster_version" {
description = "EKS Cluster version"
default = "1.26"
default = "1.28"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌🏽

@askulkarni2 askulkarni2 merged commit 2015f24 into awslabs:main Dec 14, 2023
50 checks passed
@alanty alanty deleted the spark-op-examples branch December 14, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants