Sparse Conditional Linear Regression

This project runs a system of linear regression problems on a dataset in order to find hidden patterns in a subset of the data.

Status

Deployment

This project is designed to deploy an Akka Cluster on Kubernetes. By using Kubernetes, we hope to be able to scale our ComputeActor instances to run our linear regression fits quickly.

The project uses Docker to build several container images. The provided YAML files let us push these Docker images to Kubernetes . A new feature of Kubernetes StatefulSet lets us instantiate our nodes in a specific order so that we can hard-code a set of "known" seed nodes for our Akka Cluster to use. These "seed node(s)" allow all nodes to register themselves with the cluster so it can bootstrap.

Components

Akka: A free and open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM.
Kubernetes: An open-source system for automating deployment, scaling, and management of containers.
Docker: Container building,shipping and running platform
sbt: The interactive build tool for Scala
Scala: A general-purpose programming language providing support for functional programming and a strong static type system.

Notes

The setup for this project was inspired by the following projects:

SBT Docker Kubernetes: An example project that deploys multiple docker images to Kubernetes.
Lightbend Akka Cluster on Kubernetes: A tutorial on deploying an Akka Cluster to Kubernetes.
Akka Cluster on GCP: An introduction to deploying an Akka Cluster to the Google Cloud Platform.
IBM Akka Cluster on Kubernetes: IBM's tutorial on deploying an Akka Cluster using Kubernetes.

Steps

We can deploy and run on Google Kubernetes Engine (GKE) or locally!

Note that I have instructions for this using a Mac.

Clone project somewhere.
- git clone https://github.com/johnhainline/sclr.git
- cd sclr
Install the Kubernetes CLI.
- Mac: brew cask install google-cloud-sdk which installs gcloud and other utilities. gcloud components install kubectl to get, kubectl.
Install docker.
- Mac: brew cask install docker
Run docker.
Build our base docker image:
- cd src/main/resources/docker/; docker build -t local/openjdk-custom:latest .; cd ../../../../;
- This builds the docker image referenced in our build.sbt as "local/openjdk-custom".
Create a secret in kubectl for our MySQL password.
- kubectl create secret generic mysql-password --from-literal=password=MYSQL_PASSWORD
kubectl can point to the cloud, or to a local minikube instance.
- kubectl config get-contexts and kubectl cluster-info

Build and Run Locally

Install minikube, a locally running Kubernetes cluster.
- Mac: brew cask install minikube
Start minikube, enable DNS support, connect to docker, and open the dashboard.
- minikube start
- minikube addons enable kube-dns
- eval $(minikube docker-env)
- minikube dashboard
Build project and push two docker images to our local docker install.
- sbt manage/docker:publishLocal
- sbt compute/docker:publishLocal

Build and Run on Google Kubernetes Engine

See GKE Quickstart
Get short-lived access to us.gcr.io, the Google Container Registry.
- gcloud docker -a
Build project and publish it to the Google Container Registry.
- sbt manage/docker:publish
- sbt compute/docker:publish
Create a remote cluster for running our Kubernetes scripts on.
- gcloud container clusters list
- gcloud container clusters create sclr-01 --zone us-central1-a --num-nodes 1 --cluster-version=1.9.2-gke.1
- gcloud container clusters get-credentials sclr-01
- gcloud container clusters describe sclr-01

Common Deploy/Run commands

Deploy using Kubernetes scripts.
- cd src/main/resources/kubernetes/; kubectl create -f mysql.yaml; kubectl create -f compute-pods.yaml; kubectl create -f manage-pods.yaml; cd ../../../..;
Check running pods, services, etc.
- kubectl get all -o wide
Send a single POST request (from the compute-0 pod) to the http-service endpoint. This kicks off the job.
- kubectl exec -ti compute-0 -- curl -vH "Content-Type: application/json" -X POST -d '{"name":"m5000","dnfSize":2,"optionalSample":200,"useLPNorm":true,"mu":0.24}' http-service.default.svc.cluster.local:8080/begin
Scale the compute nodes to 50
- kubectl scale statefulsets compute --replicas=50
Make a connection to the MySQL server.
- kubectl run -it --rm --image=mysql:5.7 --restart=Never mysql-client -- mysql -h mysql-service -pMYSQL_PASSWORD
Dump an entire schema from the MySQL server to our local directory.
- kubectl exec -ti MYSQL_POD_NAME -- mysqldump --add-drop-database --databases medium -pMYSQL_PASSWORD > backup.sql
Delete all local Kubernetes pods, including MySQL, etc.
- kubectl delete pvc mysql-pv-claim; kubectl delete all -l app=sclr
- Note DOUBLE CHECK EVERYTHING IS DOWN. On error things may keep running, costing money.

Benchmarks

We are using the sbt-jmh benchmarking framework. See sbt-jmh

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
bench		bench
docs		docs
lib		lib
project		project
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
benchmark.txt		benchmark.txt
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse Conditional Linear Regression

Status

Deployment

Components

Notes

Steps

Build and Run Locally

Build and Run on Google Kubernetes Engine

Common Deploy/Run commands

Benchmarks

About

Releases

Packages

Contributors 3

Languages

johnhainline/sclr

Folders and files

Latest commit

History

Repository files navigation

Sparse Conditional Linear Regression

Status

Deployment

Components

Notes

Steps

Build and Run Locally

Build and Run on Google Kubernetes Engine

Common Deploy/Run commands

Benchmarks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages