Skip to content

Commit

Permalink
Merge pull request #2 from kbase/dev_notebook
Browse files Browse the repository at this point in the history
add driver_host to spark config
  • Loading branch information
Tianhao-Gu authored May 16, 2024
2 parents bfbc095 + efb94e4 commit b09aa49
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 11 deletions.
18 changes: 8 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,29 +59,27 @@ sc.stop()

### Spark Session/Context Configuration

Ensure to configure `spark.driver.host` for the Spark driver to bind to the Jupyter notebook container's hostname
When running Spark in the Jupyter notebook container, the default `spark.driver.host` configuration is set to
the hostname (`SPARK_DRIVER_HOST`) of the container.
In addition, the environment variable `SPARK_MASTER_URL` should also be configured.

#### Example SparkSession Configuration
```python
spark = SparkSession.builder \
.master(os.environ['SPARK_MASTER_URL']) \
.appName("TestSparkJob") \
.config("spark.driver.host", os.environ['SPARK_DRIVER_HOST']) \
.getOrCreate()
```
Or

#### Example SparkContext Configuration
```python
conf = SparkConf(). \
setMaster( os.environ['SPARK_MASTER_URL']). \
setAppName("TestSparkJob"). \
set("spark.driver.host", os.environ['SPARK_DRIVER_HOST'])
setAppName("TestSparkJob")
sc = SparkContext(conf=conf)
```

Submitting job using terminal
#### Submitting a Job Using Terminal
```bash
/opt/bitnami/spark/bin/spark-submit \
--master $SPARK_MASTER_URL \
--conf spark.driver.host=$SPARK_DRIVER_HOST \
/opt/bitnami/spark/examples/src/main/python/pi.py 10 \
2>/dev/null
```
Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,5 @@ services:
- spark-master
environment:
- NOTEBOOK_PORT=4041
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_DRIVER_HOST=spark-notebook
10 changes: 10 additions & 0 deletions scripts/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

echo "starting jupyter notebook"

if [ -n "$SPARK_DRIVER_HOST" ]; then
echo "Setting spark.driver.host to $SPARK_DRIVER_HOST"
source /opt/bitnami/scripts/spark-env.sh
if [ -z "$SPARK_CONF_FILE" ]; then
echo "Error: unable to find SPARK_CONF_FILE path"
exit 1
fi
echo "spark.driver.host $SPARK_DRIVER_HOST" >> $SPARK_CONF_FILE
fi

WORKSPACE_DIR="/cdm_shared_workspace"
mkdir -p "$WORKSPACE_DIR"
cd "$WORKSPACE_DIR"
Expand Down

0 comments on commit b09aa49

Please sign in to comment.