Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Spark Operator Blueprint update #359

Merged
merged 19 commits into from
Dec 14, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions analytics/terraform/spark-k8s-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_ecrpublic_authorization_token.token](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ecrpublic_authorization_token) | data source |
| [aws_eks_cluster_auth.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster_auth) | data source |
| [aws_iam_policy_document.grafana](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.spark_operator](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
Expand All @@ -70,7 +69,7 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.26"` | no |
| <a name="input_eks_cluster_version"></a> [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.28"` | no |
| <a name="input_eks_data_plane_subnet_secondary_cidr"></a> [eks\_data\_plane\_subnet\_secondary\_cidr](#input\_eks\_data\_plane\_subnet\_secondary\_cidr) | Secondary CIDR blocks. 32766 IPs per Subnet per Subnet/AZ for EKS Node and Pods | `list(string)` | <pre>[<br> "100.64.0.0/17",<br> "100.64.128.0/17"<br>]</pre> | no |
| <a name="input_enable_amazon_prometheus"></a> [enable\_amazon\_prometheus](#input\_enable\_amazon\_prometheus) | Enable AWS Managed Prometheus service | `bool` | `true` | no |
| <a name="input_enable_vpc_endpoints"></a> [enable\_vpc\_endpoints](#input\_enable\_vpc\_endpoints) | Enable VPC Endpoints | `bool` | `false` | no |
Expand Down
8 changes: 6 additions & 2 deletions analytics/terraform/spark-k8s-operator/addons.tf
Original file line number Diff line number Diff line change
Expand Up @@ -88,13 +88,17 @@ module "eks_blueprints_addons" {
repository_username = data.aws_ecrpublic_authorization_token.token.user_name
repository_password = data.aws_ecrpublic_authorization_token.token.password
}

karpenter_node = {
iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
}
#---------------------------------------
# CloudWatch metrics for EKS
#---------------------------------------
enable_aws_cloudwatch_metrics = true
aws_cloudwatch_metrics = {
values = [templatefile("${path.module}/helm-values/aws-cloudwatch-metrics-valyes.yaml", {})]
values = [templatefile("${path.module}/helm-values/aws-cloudwatch-metrics-values.yaml", {})]
}

#---------------------------------------
Expand Down
4 changes: 0 additions & 4 deletions analytics/terraform/spark-k8s-operator/data.tf
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
data "aws_eks_cluster_auth" "this" {
name = module.eks.cluster_name
}

data "aws_ecrpublic_authorization_token" "token" {
provider = aws.ecr
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ kind: SparkApplication
metadata:
name: tpcds-benchmark-3tb
namespace: spark-team-a
labels:
app: "tpcds-benchmark"
applicationId: "tpcds-benchmark-3t"
# Assign the job to a Yunikorn Queue via label.
queue: root.prod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here saying "YuniKorn Queue"

spec:
type: Scala
mode: cluster
Expand Down Expand Up @@ -65,52 +70,32 @@ spec:
spark.kubernetes.driver.requestTimeout: "120000"
# spark.kubernetes.allocation.batch.size: "20" # default 5 but adjust according to your cluster size
# -----------------------------------------------------
volumes:
- name: spark-local-dir-1
hostPath:
path: /local1
driver:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
cores: 4
Comment on lines -68 to 74
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment block in the Karpenter Provisioner and Userdata in main.tf around NVMe and the volumes.

Then added a callout on each of the pods around the node selectors.

coreLimit: "4.1"
memory: "5g"
memoryOverhead: "1000"
serviceAccount: spark-team-a
# the c5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
provisioner: spark-compute-optimized
tolerations:
- key: "spark-compute-optimized"
operator: "Exists"
effect: "NoSchedule"
executor:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
cores: 4
coreLimit: "4.3"
memory: "6g"
memoryOverhead: "2g"
# 8 executors per node
instances: 47 # changed from 47 to 20 for demo
serviceAccount: spark-team-a
# the c5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
provisioner: spark-compute-optimized
tolerations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ kind: SparkApplication
metadata:
name: tpcds-data-generation-3t
namespace: spark-team-a
labels:
app: "tpcds-data-generation"
applicationId: "tpcds-data-generation-3t"
# Assign the job to a Yunikorn Queue via label.
queue: root.prod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

spec:
type: Scala
mode: cluster
Expand Down Expand Up @@ -64,50 +69,30 @@ spec:

restartPolicy:
type: Never
volumes:
- name: spark-local-dir-1
hostPath:
path: /local1
driver:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
Comment on lines -67 to -82
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

cores: 10
coreLimit: "10.1"
memory: "10g"
serviceAccount: spark-team-a
# the c5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
provisioner: spark-compute-optimized
tolerations:
- key: "spark-compute-optimized"
operator: "Exists"
effect: "NoSchedule"
executor:
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
readOnly: false
initContainers:
- name: volume-permission
image: public.ecr.aws/y4g4v0z7/busybox
command: ['sh', '-c', 'mkdir /ossdata1; chown -R 1000:1000 /ossdata1']
volumeMounts:
- name: spark-local-dir-1
mountPath: /ossdata1
cores: 11
coreLimit: "11.1"
memory: "15g"
# 3 executors per node 9 nodes
instances: 26
serviceAccount: spark-team-a
# the c5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# the data generation can utilize a large amount of storage
nodeSelector:
provisioner: spark-compute-optimized
tolerations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,51 +79,24 @@ spec:
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
volumes: # using NVMe instance storage mounted on /local1
- name: spark-local-dir-1
hostPath:
path: /local1
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
Comment on lines -82 to -99
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

cores: 1
coreLimit: "1200m"
memory: "4g"
memoryOverhead: "4g"
serviceAccount: spark-team-a
labels:
version: 3.2.1
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-on-demand-ca"
tolerations:
- key: "spark-on-demand-ca"
operator: "Exists"
effect: "NoSchedule"
executor:
podSecurityContext:
fsGroup: 185
volumeMounts:
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
cores: 1
coreLimit: "1200m"
instances: 4
Expand All @@ -132,6 +105,9 @@ spec:
serviceAccount: spark-team-a
labels:
version: 3.2.1
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-spot-ca"
tolerations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ metadata:
labels:
app: "taxi-trip"
applicationId: "taxi-trip-yunikorn"
# Assign the job to a Yunikorn Queue via label.
queue: root.test
spec:
# To create Ingress object for Spark driver.
Expand Down Expand Up @@ -79,24 +80,7 @@ spec:
onFailureRetryInterval: 10
onSubmissionFailureRetries: 5
onSubmissionFailureRetryInterval: 20
volumes: # using NVMe instance storage mounted on /local1
- name: spark-local-dir-1
hostPath:
path: /local1
type: Directory

driver:
volumeMounts: # Points to InstanceStore 150GB NVMe SSD for shuffle spill over from memory
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
Comment on lines -82 to -99
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a comment stating that the NVMe SSD disks provided with c5d instances are automatically formatted and mounted by Karpenter/CA, and subsequently integrated into the node as the primary storage volume. Therefore, it is unnecessary to employ hostPath in Spark jobs for mounting this volume to pods. Instead, Spark pods can effortlessly utilize the local NVMe SSD through the use of emptyDir().

cores: 1
coreLimit: "1200m"
memory: "4g"
Expand Down Expand Up @@ -134,26 +118,16 @@ spec:
},
"tolerations": [{"key": "spark-spot-ca", "operator": "Exists", "effect": "NoSchedule"}]
}]
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-on-demand-ca"
tolerations:
- key: "spark-on-demand-ca"
operator: "Exists"
effect: "NoSchedule"
executor:
podSecurityContext:
fsGroup: 185
volumeMounts:
- name: spark-local-dir-1
mountPath: /data1
readOnly: false
initContainers:
- name: volume-permissions
image: public.ecr.aws/y4g4v0z7/busybox
command: [ 'sh', '-c', 'chown -R 185 /local1' ]
volumeMounts:
- mountPath: "/local1"
name: "spark-local-dir-1"
cores: 1
coreLimit: "1200m"
instances: 4
Expand All @@ -164,6 +138,9 @@ spec:
version: 3.2.1
annotations:
yunikorn.apache.org/task-group-name: "spark-executor"
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-spot-ca"
tolerations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ kind: SparkApplication
metadata:
name: pyspark-pi
namespace: spark-team-a
labels:
app: "pyspark-pi"
applicationId: "pyspark-pi-ca"
# Assign the job to a Yunikorn Queue via label.
queue: root.test
spec:
type: Python
pythonVersion: "3"
Expand All @@ -28,6 +33,9 @@ spec:
labels:
version: 3.1.1
serviceAccount: spark-team-a
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-on-demand-ca"
tolerations:
Expand All @@ -41,6 +49,9 @@ spec:
serviceAccount: spark-team-a
labels:
version: 3.1.1
# the r5d instances that Karpenter will launch will have the NVMe storage preformatted and available to the pod
# we do not need to leverage a hostPath mount or volume to leverage that storage.
# ephemeral-storage requests and limits can be used to manage the storage utilization
nodeSelector:
NodeGroupType: "spark-spot-ca"
tolerations:
Expand Down
Loading
Loading