Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add jupyter notebook #1

Merged
merged 10 commits into from
May 16, 2024
Merged

add jupyter notebook #1

merged 10 commits into from
May 16, 2024

Conversation

Tianhao-Gu
Copy link
Collaborator

No description provided.

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.


```bash
docker-compose up --build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can daemonize with -d

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. But I usually want to see the logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker compose logs

but if you want to leave it as is that's fine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you can view logs that way. I think it's convenient to view it as it's happening. I will just leave it as is for now. It's individual dev's choice I guess.

README.md Outdated
sh -c '
/opt/bitnami/spark/bin/spark-submit \
--master spark://spark-master:7077 \
Copy link
Member

@MrCreosote MrCreosote May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the spark master arg use the environment variable in docker-compose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

.master(os.environ['SPARK_MASTER_URL']) \
.appName("TestSparkJob") \
.config("spark.driver.host", os.environ['SPARK_DRIVER_HOST']) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just add this to the config in the image in the entrypoint so the user doesn't have to worry about it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you referring to make changes to the Spark configuration file? I can do it in my next PR. I think it will involve a lot of testing commit since locally it always works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, but regardless of the implemenation the behavior should be that we set some environment var in rancher to the notebook hostname and then the user doesn't need to worry about the host name at all, it Just Works

next PR is fine

Comment on lines +71 to +76
conf = SparkConf(). \
setMaster( os.environ['SPARK_MASTER_URL']). \
setAppName("TestSparkJob"). \
set("spark.driver.host", os.environ['SPARK_DRIVER_HOST'])
sc = SparkContext(conf=conf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to set up both the session and the context? Since they're in different code blocks it seems like they're different things

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you don't need to set up both. SparkContext has more control over spark but more complicated to config. Some people have a preference for either option. So I just give 2 examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put "OR" between them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


spark-master:
image: bitnami/spark:3.5.1
container_name: spark-master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for providing container names that appear to be redundant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the container name is _1. I just don't like the suffix when I need to docker exec -it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docker-compose.yaml Show resolved Hide resolved
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_CORES=2
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_PORT=8091
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any need to specify the worker ports

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@MrCreosote MrCreosote May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dockerfile: Dockerfile
container_name: spark-notebook
ports:
- "4040:4040"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is port 4040 mapped?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. I must test something before and I forget to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Dockerfile Outdated
RUN chmod a+x /opt/entrypoint.sh

# Switch back to the original user
#USER ${ORI_USER}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be commented out now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh oh. I was testing things and did add .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Dockerfile Outdated

ENTRYPOINT ["sleep 10s"]
ENV PYTHON_VER=python3.11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it I'm pretty sure bitnami installs python in the image. Is there a reason you're not using that python, assuming I'm right? If the python version here doesn't match what's in the cluster the job won't run

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. I can use python from the image. But I remember you want to control which python version we are using. I have no strong option on that. I am fine either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the prior repo the version was set for both the cluster and the test image (assuming you were using the same image). Here the python version could be different for the cluster and the notebook which would break things

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Switched to use image python. Both base image has python installed. I don't see the differences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both base image has python installed.

What do you mean by both base images?

I don't see the differences.

Not sure what you mean here either

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both bitnami/spark and the image you used previously has python. I don't see why we spent a lot of time trying to install python before. We could just use the python form the image before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old image had 3.10. If it had 3.11 I wouldn't have bothered

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Tianhao-Gu Tianhao-Gu merged commit bfbc095 into main May 16, 2024
6 checks passed
@Tianhao-Gu Tianhao-Gu deleted the dev_notebook branch May 16, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants