If you prefer local access (not to have EC2 instance for Apache Spark history server), you can also use a Docker to start the Apache Spark history server and view the Spark UI locally. This Dockerfile is a sample that you should modify to meet your requirements.
- Install Docker
- Download the Dockerfile and the pom file from the GitHub repository
- Run commands shown below
$ docker build -t glue/sparkui:latest .
Using AWS named profile
- Run commands shown below
- Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
- Set PROFILE_NAME with your AWS named profile
$ LOG_DIR="s3a://path_to_eventlog/" $ PROFILE_NAME="profile_name" $ docker run -itd -v ~/.aws:/root/.aws -e AWS_PROFILE=$PROFILE_NAME -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
Using a pair of AWS access key and secret key
- Run commands shown below
- Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
- Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with your valid AWS credential
$ LOG_DIR="s3a://path_to_eventlog/" $ AWS_ACCESS_KEY_ID="AKIAxxxxxxxxxxxx" $ AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyy" $ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
Using AWS temporary credentials
- Run commands shown below
- Set LOG_DIR by replacing s3a://path_to_eventlog with your event log directory
- Set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN with your valid AWS credential
$ LOG_DIR="s3a://path_to_eventlog/" $ AWS_ACCESS_KEY_ID="ASIAxxxxxxxxxxxx" $ AWS_SECRET_ACCESS_KEY="yyyyyyyyyyyyyyy" $ AWS_SESSION_TOKEN="zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" $ docker run -itd -e SPARK_HISTORY_OPTS="$SPARK_HISTORY_OPTS -Dspark.history.fs.logDirectory=$LOG_DIR -Dspark.hadoop.fs.s3a.access.key=$AWS_ACCESS_KEY_ID -Dspark.hadoop.fs.s3a.secret.key=$AWS_SECRET_ACCESS_KEY -Dspark.hadoop.fs.s3a.session.token=$AWS_SESSION_TOKEN -Dspark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -p 18080:18080 glue/sparkui:latest "/opt/spark/bin/spark-class org.apache.spark.deploy.history.HistoryServer"
These configuration parameters come from the Hadoop-AWS Module. You may need to add specific configuration based on your use case. For example: users in isolated regions will need to configure the spark.hadoop.fs.s3a.endpoint
.
For Beijing region, add following config:
-Dspark.hadoop.fs.s3a.endpoint=s3.cn-north-1.amazonaws.com.cn
For Ningxia region, add following config:
-Dspark.hadoop.fs.s3a.endpoint=s3.cn-northwest-1.amazonaws.com.cn
- Open http://localhost:18080 in your browser