Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Spark operator updates #484

Merged
merged 11 commits into from
Apr 3, 2024

Conversation

raykrueger
Copy link
Contributor

What does this PR do?

This pull request adds much needed details to the spark operator page.

  1. It is now clear where to find the S3 Bucket that is created.
  2. All cd ... commands now use an environment variable called DOEKS_HOME that ensures copy and paste always works in every example.
  3. Example sed commands are provided for replacing the S3_BUCKET placeholders in scripts and spark application yaml manifests.
  4. All s3 bucket placeholders in the examples have been unified to use <S3_BUCKET> in every file.

Some quality of life changes as well.
Using Docusaurus partials for paragraphs that are repeated throughout examples.
Move large yaml examples out of the single markdown file into their own partials.

Motivation

The spark on EKS examples are currently very terse. Many assumptions were being made on what the user must know ahead of time. Additionally, most examples were simply incorrect (taxi script always needs input, benchmark files didn't exist).

The changes can be previewed here (note that navigation is wonky in this temp deployment).
https://d2gd59uo3ya1kt.cloudfront.net/data-on-eks/docs/blueprints/data-analytics/spark-operator-yunikorn.html

More

  • [ x] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • [ x] Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • [ x] Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • [x ] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

Adding a DOEKS_HOME varaible set to the data-on-eks directory means that
each command in the spark on eks docs can be copied and pasted directly.

Originally the docs simply said `cd analytics/...` repeastedly and that
path was never correct.
I am unifying all the examples to use <S3_BUCKET> as the placeholder in
files. I have added documentation to show how to use sed to do the
varaiable replaement rather than putting the user through manual
efforts.
The benchmark docs for spark operator point to the wrong file names.

Unify the Bucket place holder to <S3_BUCKET>

Update the benchmark docs with clear instructions
This closes out the Karpenter updates.
@raykrueger raykrueger changed the title Spark operator updates docs: Spark operator updates Apr 3, 2024
Copy link
Collaborator

@askulkarni2 askulkarni2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

@askulkarni2 askulkarni2 merged commit d0ae18b into awslabs:main Apr 3, 2024
56 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants