This project runs a system of linear regression problems on a dataset in order to find hidden patterns in a subset of the data.
This project is designed to deploy an Akka Cluster on Kubernetes. By using Kubernetes, we hope to be able to scale our ComputeActor instances to run our linear regression fits quickly.
The project uses Docker to build several container images. The provided YAML files let us push these Docker images to
Kubernetes . A new feature of Kubernetes StatefulSet
lets us instantiate our nodes in a specific order so that we can hard-code a
set of "known" seed nodes for our Akka Cluster to use. These "seed node(s)" allow all nodes to register themselves with
the cluster so it can bootstrap.
- Akka: A free and open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM.
- Kubernetes: An open-source system for automating deployment, scaling, and management of containers.
- Docker: Container building,shipping and running platform
- sbt: The interactive build tool for Scala
- Scala: A general-purpose programming language providing support for functional programming and a strong static type system.
The setup for this project was inspired by the following projects:
- SBT Docker Kubernetes: An example project that deploys multiple docker images to Kubernetes.
- Lightbend Akka Cluster on Kubernetes: A tutorial on deploying an Akka Cluster to Kubernetes.
- Akka Cluster on GCP: An introduction to deploying an Akka Cluster to the Google Cloud Platform.
- IBM Akka Cluster on Kubernetes: IBM's tutorial on deploying an Akka Cluster using Kubernetes.
We can deploy and run on Google Kubernetes Engine (GKE) or locally!
Note that I have instructions for this using a Mac.
- Clone project somewhere.
git clone https://github.com/johnhainline/sclr.git
cd sclr
- Install the Kubernetes CLI.
- Mac:
brew cask install google-cloud-sdk
which installsgcloud
and other utilities.gcloud components install kubectl
to get,kubectl
.
- Mac:
- Install docker.
- Mac:
brew cask install docker
- Mac:
- Run docker.
- Build our base docker image:
cd src/main/resources/docker/; docker build -t local/openjdk-custom:latest .; cd ../../../../;
- This builds the docker image referenced in our build.sbt as
"local/openjdk-custom"
.
- Create a secret in
kubectl
for our MySQL password.kubectl create secret generic mysql-password --from-literal=password=MYSQL_PASSWORD
kubectl
can point to the cloud, or to a local minikube instance.kubectl config get-contexts
andkubectl cluster-info
- Install
minikube
, a locally running Kubernetes cluster.- Mac:
brew cask install minikube
- Mac:
- Start
minikube
, enable DNS support, connect to docker, and open the dashboard.minikube start
minikube addons enable kube-dns
eval $(minikube docker-env)
minikube dashboard
- Build project and push two docker images to our local docker install.
sbt manage/docker:publishLocal
sbt compute/docker:publishLocal
- See GKE Quickstart
- Get short-lived access to
us.gcr.io
, the Google Container Registry.gcloud docker -a
- Build project and publish it to the Google Container Registry.
sbt manage/docker:publish
sbt compute/docker:publish
- Create a remote cluster for running our Kubernetes scripts on.
gcloud container clusters list
gcloud container clusters create sclr-01 --zone us-central1-a --num-nodes 1 --cluster-version=1.9.2-gke.1
gcloud container clusters get-credentials sclr-01
gcloud container clusters describe sclr-01
- Deploy using Kubernetes scripts.
cd src/main/resources/kubernetes/; kubectl create -f mysql.yaml; kubectl create -f compute-pods.yaml; kubectl create -f manage-pods.yaml; cd ../../../..;
- Check running pods, services, etc.
kubectl get all -o wide
- Send a single POST request (from the
compute-0
pod) to thehttp-service
endpoint. This kicks off the job.kubectl exec -ti compute-0 -- curl -vH "Content-Type: application/json" -X POST -d '{"name":"m5000","dnfSize":2,"optionalSample":200,"useLPNorm":true,"mu":0.24}' http-service.default.svc.cluster.local:8080/begin
- Scale the
compute
nodes to 50kubectl scale statefulsets compute --replicas=50
- Make a connection to the MySQL server.
kubectl run -it --rm --image=mysql:5.7 --restart=Never mysql-client -- mysql -h mysql-service -pMYSQL_PASSWORD
- Dump an entire schema from the MySQL server to our local directory.
kubectl exec -ti MYSQL_POD_NAME -- mysqldump --add-drop-database --databases medium -pMYSQL_PASSWORD > backup.sql
- Delete all local Kubernetes pods, including MySQL, etc.
kubectl delete pvc mysql-pv-claim; kubectl delete all -l app=sclr
- Note DOUBLE CHECK EVERYTHING IS DOWN. On error things may keep running, costing money.
We are using the sbt-jmh benchmarking framework. See sbt-jmh