docs: Spark operator updates #484

raykrueger · 2024-04-03T01:39:10Z

What does this PR do?

This pull request adds much needed details to the spark operator page.

It is now clear where to find the S3 Bucket that is created.
All cd ... commands now use an environment variable called DOEKS_HOME that ensures copy and paste always works in every example.
Example sed commands are provided for replacing the S3_BUCKET placeholders in scripts and spark application yaml manifests.
All s3 bucket placeholders in the examples have been unified to use <S3_BUCKET> in every file.

Some quality of life changes as well.
Using Docusaurus partials for paragraphs that are repeated throughout examples.
Move large yaml examples out of the single markdown file into their own partials.

Motivation

The spark on EKS examples are currently very terse. Many assumptions were being made on what the user must know ahead of time. Additionally, most examples were simply incorrect (taxi script always needs input, benchmark files didn't exist).

The changes can be previewed here (note that navigation is wonky in this temp deployment).
https://d2gd59uo3ya1kt.cloudfront.net/data-on-eks/docs/blueprints/data-analytics/spark-operator-yunikorn.html

More

[ x] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
[ x] Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
[ x] Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
[x ] Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

E2E Test successfully complete before merge?

Additional Notes

Adding a DOEKS_HOME varaible set to the data-on-eks directory means that each command in the spark on eks docs can be copied and pasted directly. Originally the docs simply said `cd analytics/...` repeastedly and that path was never correct.

I am unifying all the examples to use <S3_BUCKET> as the placeholder in files. I have added documentation to show how to use sed to do the varaiable replaement rather than putting the user through manual efforts.

The benchmark docs for spark operator point to the wrong file names. Unify the Bucket place holder to <S3_BUCKET> Update the benchmark docs with clear instructions

This closes out the Karpenter updates.

askulkarni2

Thanks for the PR!

raykrueger added 11 commits April 1, 2024 13:05

WIP update spark operator docs

9e284c2

Create a DOEKS_HOME environment variable

3d79567

Adding a DOEKS_HOME varaible set to the data-on-eks directory means that each command in the spark on eks docs can be copied and pasted directly. Originally the docs simply said `cd analytics/...` repeastedly and that path was never correct.

Fix leading white space on some of the commands

caf5c2f

Rename ENTER_S3_BUCKET to S3_BUCKET

22da57e

Provide an example of using sed for bucket names

4be92d1

I am unifying all the examples to use <S3_BUCKET> as the placeholder in files. I have added documentation to show how to use sed to do the varaiable replaement rather than putting the user through manual efforts.

Do the same for EBS Dynamic PVC

ee99a02

Update spark operator benchmark example

8731ef8

The benchmark docs for spark operator point to the wrong file names. Unify the Bucket place holder to <S3_BUCKET> Update the benchmark docs with clear instructions

Update Yunikorn and Karpenter spark operator docs

e04bf95

This closes out the Karpenter updates.

Move big yaml blocks to includes

71d7c01

Move repetative paragraphs to partial includes

2519999

S3_BUCKET not S3_HOME

912d47f

raykrueger temporarily deployed to DoEKS Test April 3, 2024 01:39 — with GitHub Actions Inactive

raykrueger changed the title ~~Spark operator updates~~ docs: Spark operator updates Apr 3, 2024

askulkarni2 approved these changes Apr 3, 2024

View reviewed changes

askulkarni2 merged commit d0ae18b into awslabs:main Apr 3, 2024
56 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Spark operator updates #484

docs: Spark operator updates #484

raykrueger commented Apr 3, 2024

askulkarni2 left a comment

docs: Spark operator updates #484

docs: Spark operator updates #484

Conversation

raykrueger commented Apr 3, 2024

What does this PR do?

Motivation

More

For Moderators

Additional Notes

askulkarni2 left a comment

Choose a reason for hiding this comment