-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add jupyter notebook #1
Conversation
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
|
||
```bash | ||
docker-compose up --build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can daemonize with -d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. But I usually want to see the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker compose logs
but if you want to leave it as is that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you can view logs that way. I think it's convenient to view it as it's happening. I will just leave it as is for now. It's individual dev's choice I guess.
README.md
Outdated
sh -c ' | ||
/opt/bitnami/spark/bin/spark-submit \ | ||
--master spark://spark-master:7077 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the spark master arg use the environment variable in docker-compose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
.master(os.environ['SPARK_MASTER_URL']) \ | ||
.appName("TestSparkJob") \ | ||
.config("spark.driver.host", os.environ['SPARK_DRIVER_HOST']) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just add this to the config in the image in the entrypoint so the user doesn't have to worry about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you referring to make changes to the Spark configuration file? I can do it in my next PR. I think it will involve a lot of testing commit since locally it always works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking, but regardless of the implemenation the behavior should be that we set some environment var in rancher to the notebook hostname and then the user doesn't need to worry about the host name at all, it Just Works
next PR is fine
conf = SparkConf(). \ | ||
setMaster( os.environ['SPARK_MASTER_URL']). \ | ||
setAppName("TestSparkJob"). \ | ||
set("spark.driver.host", os.environ['SPARK_DRIVER_HOST']) | ||
sc = SparkContext(conf=conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have to set up both the session and the context? Since they're in different code blocks it seems like they're different things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you don't need to set up both. SparkContext has more control over spark but more complicated to config. Some people have a preference for either option. So I just give 2 examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put "OR" between them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
spark-master: | ||
image: bitnami/spark:3.5.1 | ||
container_name: spark-master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for providing container names that appear to be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the container name is _1. I just don't like the suffix when I need to docker exec -it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
docker-compose.yaml
Outdated
- SPARK_MASTER_URL=spark://spark-master:7077 | ||
- SPARK_WORKER_CORES=2 | ||
- SPARK_WORKER_MEMORY=1G | ||
- SPARK_WORKER_PORT=8091 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any need to specify the worker ports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
docker-compose.yaml
Outdated
dockerfile: Dockerfile | ||
container_name: spark-notebook | ||
ports: | ||
- "4040:4040" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is port 4040 mapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed. I must test something before and I forget to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: MrCreosote <[email protected]>
Dockerfile
Outdated
RUN chmod a+x /opt/entrypoint.sh | ||
|
||
# Switch back to the original user | ||
#USER ${ORI_USER} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be commented out now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh oh. I was testing things and did add .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Dockerfile
Outdated
|
||
ENTRYPOINT ["sleep 10s"] | ||
ENV PYTHON_VER=python3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it I'm pretty sure bitnami installs python in the image. Is there a reason you're not using that python, assuming I'm right? If the python version here doesn't match what's in the cluster the job won't run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. I can use python from the image. But I remember you want to control which python version we are using. I have no strong option on that. I am fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the prior repo the version was set for both the cluster and the test image (assuming you were using the same image). Here the python version could be different for the cluster and the notebook which would break things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Switched to use image python. Both base image has python installed. I don't see the differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both base image has python installed.
What do you mean by both base images?
I don't see the differences.
Not sure what you mean here either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both bitnami/spark
and the image you used previously has python. I don't see why we spent a lot of time trying to install python before. We could just use the python form the image before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old image had 3.10. If it had 3.11 I wouldn't have bothered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
No description provided.