Skip to content

Latest commit

 

History

History
41 lines (24 loc) · 2.36 KB

S3ForSparkSubmit.md

File metadata and controls

41 lines (24 loc) · 2.36 KB

Informations how to use S3 for uploading JAR files to use in Spark submit

The Amazon Simple Storage Service (S3) can be used to exchange files between your local development machine and your cluster.

However, since your AWS credentials will not be known to all things running on the cluster, data will need to be public and the files you upload may not contain any confidential information

It can be used from the Amazon AWS Web Concole, but for repeated use, the commandline is far more convenient.

Creating a (publicly readable) bucket

Buckets are the root level in which S3 stores the information - and they should intensionally not be assigned to a single account, so they need to have a unique name of their own.

You can make a new bucket with aws s3 mb s3://<bucketname>

Keep in mind that buckets are assigned a region, and this command will make the bucket in your default region - which should be us-west-1, which is fine.

Now make this bucket readable by everyone: aws s3api put-bucket-acl --bucket <bucketname> --grant-read 'uri="http://acs.amazonaws.com/groups/global/AllUsers"'

(See http://docs.aws.amazon.com/cli/latest/userguide/using-s3api-commands.html for details.)

Uploading a (jar) file

Upload the file from commandline directly making it readable to everyone: aws s3 cp build/libs/<your.jar> s3://<bucketname> --grants read=uri=http://acs.amazonaws.com/groups/global/AllUsers

Accessing the file

The uploaded file is now accessible via a URL like https://s3-us-west-1.amazonaws.com/<bucketname>/<your.jar>

This URL can get used in dcos spark run --submit-args=" https://s3-us-west-1.amazonaws.com/<bucketname>/<your.jar>"

Optional: When done, delete bucket

To remove a bucket, execute aws s3 rb s3://<bucketname> You may add --force if you feel confident enough.

(See http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html for details.)