-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add EMR runtime for flink operator blueprint #485
Conversation
streaming/emr-eks-flink/README.md
Outdated
|------|---------| | ||
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.0.0 | | ||
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 3.72 | | ||
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | >= 2.4.1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to use Helm >= v3.8.0 where OCI support is enabled by default. https://helm.sh/docs/topics/registries/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Terraform provider version and the latest version is 2.13.0 https://registry.terraform.io/providers/hashicorp/helm/latest/docs/guides/v2-upgrade-guide and it uses latest Helm version 3.13.2 https://github.com/hashicorp/terraform-provider-helm/blob/8d8f0503c39f9fb2d718805e5a619ef56728271b/go.mod#L13
@mithun008 Upgrade this to 2.13 in versions.tf
file and rerun it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
streaming/emr-eks-flink/addons.tf
Outdated
enable_aws_load_balancer_controller = true | ||
|
||
#--------------------------------------- | ||
# AWS for FluentBit - DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use FluentBit and DaemonSet. Instead, you may set up monitoring configurations to archive application logs to S3/CW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right @mengfwanaws! We use FluentBit to capture the logs from other k8s add-ons and apps running on the EKS cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
|
||
flinkVersion: v1_17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emr-7.0.0
should use v1_18
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# add service account | ||
# serviceAccount: flink-team-a-sa | ||
|
||
executionRoleArn: arn:aws:iam::681921237057:role/emr-eks-flink-flink-team-a-20240406012025932700000008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's this 681921237057
? Shall we dynamically set it up for customer acct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it and added comments as well as docs.
taskmanager.numberOfTaskSlots: "2" | ||
state.savepoints.dir: file:///flink/data/checkpoint/savepoints | ||
state.checkpoints.dir: file:///flink/data/checkpoint/checkpoints | ||
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated config. Use high-availability.type: kubernetes
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use fluentbit. Please refer to monitoring configuration instead.
3. Automatically tunes Autoscaler configurations based on historical trends of observed metrics. | ||
4. Faster Flink Job Restart during scaling or Failure Recovery | ||
5. IRSA (IAM Roles for Service Accounts) Native Integration | ||
6. Apache Airflow Integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an example in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, @mithun008! I've suggested some changes.
Overall, I would like you to align the blueprint more closely with emr-eks-karpenter
. Additionally, we should include more examples, which can be added in subsequent PRs.
streaming/emr-eks-flink/addons.tf
Outdated
enable_aws_load_balancer_controller = true | ||
|
||
#--------------------------------------- | ||
# AWS for FluentBit - DaemonSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right @mengfwanaws! We use FluentBit to capture the logs from other k8s add-ons and apps running on the EKS cluster.
streaming/emr-eks-flink/README.md
Outdated
|------|---------| | ||
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.0.0 | | ||
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 3.72 | | ||
| <a name="requirement_helm"></a> [helm](#requirement\_helm) | >= 2.4.1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the Terraform provider version and the latest version is 2.13.0 https://registry.terraform.io/providers/hashicorp/helm/latest/docs/guides/v2-upgrade-guide and it uses latest Helm version 3.13.2 https://github.com/hashicorp/terraform-provider-helm/blob/8d8f0503c39f9fb2d718805e5a619ef56728271b/go.mod#L13
@mithun008 Upgrade this to 2.13 in versions.tf
file and rerun it
streaming/emr-eks-flink/addons.tf
Outdated
# For example only - please evaluate for your environment | ||
force_destroy = true | ||
|
||
attach_deny_insecure_transport_policy = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are defaults i think now
streaming/emr-eks-flink/main.tf
Outdated
"karpenter.sh/discovery" = local.name | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra lines
streaming/emr-eks-flink/main.tf
Outdated
} | ||
|
||
#import module vpc | ||
module "vpc" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to vpc.tf
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow the existing emr-eks-karpenter
blueprint as guidance https://github.com/awslabs/data-on-eks/blob/main/analytics/terraform/emr-eks-karpenter/vpc.tf
streaming/emr-eks-flink/main.tf
Outdated
|
||
|
||
# deploy a helm chart for flink-kubernetes-operator | ||
resource "helm_release" "flink_kubernetes_operator" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this and use this add-on from EKS Data addons https://github.com/aws-ia/terraform-aws-eks-data-addons/blob/main/emr-flink-operator.tf. Update the addon in this repo if required
streaming/emr-eks-flink/outputs.tf
Outdated
@@ -0,0 +1,9 @@ | |||
# create output for flink operator role arn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this
################################################################################
# EKS Managed Node Group
################################################################################
output "configure_kubectl" {
description = "Configure kubectl: make sure you're logged in with the correct AWS profile and run the following command to update your kubeconfig"
value = "aws eks --region ${local.region} update-kubeconfig --name ${module.eks.cluster_name}"
}
@@ -0,0 +1,36 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this file and move these providers to main.tf
streaming/emr-eks-flink/addons.tf
Outdated
@@ -143,11 +139,27 @@ module "eks_blueprints_addons" { | |||
# Data on EKS Kubernetes Addons | |||
#--------------------------------------------------------------- | |||
module "eks_data_addons" { | |||
|
|||
depends_on = [module.flink_irsa_jobs, module.flink_irsa_operator] | |||
#source = "[email protected]:mithun008/terraform-aws-eks-data-addons/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
executionRoleArn: arn:aws:iam::xxxxxxxxx:role/emr-eks-flink-flink-team-a-20240406012025932700000008 | ||
# Replace this execution role ARN with your own | ||
executionRoleArn: arn:aws:iam::681921237057:role/emr-eks-flink-flink-team-a-20240406012025932700000008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should substitute this with variables, or provide instructions on how to generate the role arn (including the permissions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the instructions in the website docs to substitute these values.
state.savepoints.dir: s3://emr-flink-data/savepoints | ||
|
||
# Replace this execution role ARN with your own | ||
executionRoleArn: arn:aws:iam::681921237057:role/emr-eks-flink-flink-team-a-20240406012025932700000008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
@vara-bonthu I have addressed your comments. Thank you! |
streaming/emr-eks-flink/outputs.tf
Outdated
@@ -0,0 +1,27 @@ | |||
# create output for flink operator role arn | |||
output "flink_jobs_role_arn" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flink_job_execution_role_arn
streaming/emr-eks-flink/outputs.tf
Outdated
output "flink_checkpoint_path" { | ||
value = "s3://${module.s3_bucket.s3_bucket_id}/checkpoints" | ||
description = "S3 path for checkpoint data" | ||
} | ||
output "flink_savepoint_path" { | ||
value = "s3://${module.s3_bucket.s3_bucket_id}/savepoints" | ||
description = "S3 path for savepoint data" | ||
} | ||
output "flink_jobmanager_path" { | ||
value = "s3://${module.s3_bucket.s3_bucket_id}/jobmanager" | ||
description = "S3 path for jobmanager data" | ||
} | ||
|
||
output "flink_logs_path" { | ||
value = "s3://${module.s3_bucket.s3_bucket_id}/logs" | ||
description = "S3 path for logs" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all of these and add only s3bucet id
|
||
jobmanager.scheduler: adaptive | ||
# Replace with s3 bucket in your own account | ||
state.checkpoints.dir: s3://emr-flink-data/checkpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<<REPLACE_WITH_S3_BUCKET_ID>>
streaming/emr-eks-flink/versions.tf
Outdated
# backend "s3" { | ||
# bucket = "doeks-github-actions-e2e-test-state" | ||
# region = "us-west-2" | ||
# key = "e2e/flink/terraform.tfstate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emr-eks-flink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left few minor comments otherwise its good
# EMR Flink operator | ||
#--------------------------------------------------------------- | ||
enable_emr_flink_operator = true | ||
emr_flink_operator_helm_config = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace this bloc with
emr_flink_operator_helm_config = {
repository = "oci://public.ecr.aws/emr-on-eks"
operatorExecutionRoleArn = module.flink_irsa_operator.iam_role_arn
}
You can remove the repository
and operatorExecutionRoleArn
once you update these in Data addons repo
@@ -0,0 +1,64 @@ | |||
# NOTE: Make sure you you replce <ENTER_S3_BUCKET> with your flink_operator_bucket output before running this job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check spelling and repeated words
@@ -0,0 +1,74 @@ | |||
# NOTE: Make sure you you replce <ENTER_S3_BUCKET> and with your S3 Bucket before running this job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check spelling and repeated words
@@ -0,0 +1,75 @@ | |||
# NOTE: Make sure you you replce <ENTER_S3_BUCKET> with your flink_operator_bucket output before running this job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check spelling and repeated words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Co-authored-by: Mithun Mallick <[email protected]>
What does this PR do?
🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
Added blue print to support EMR Flink operator on EKS.
Motivation
EMR Flink operator enhances the open source Flink operator by addition AWS specific capabilities for scaling, logging and monitoring.
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes