Skip to content

johnhainline/sclr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparse Conditional Linear Regression

This project runs a system of linear regression problems on a dataset in order to find hidden patterns in a subset of the data.

Status

Build Status

Deployment

This project is designed to deploy an Akka Cluster on Kubernetes. By using Kubernetes, we hope to be able to scale our ComputeActor instances to run our linear regression fits quickly.

The project uses Docker to build several container images. The provided YAML files let us push these Docker images to Kubernetes . A new feature of Kubernetes StatefulSet lets us instantiate our nodes in a specific order so that we can hard-code a set of "known" seed nodes for our Akka Cluster to use. These "seed node(s)" allow all nodes to register themselves with the cluster so it can bootstrap.

Components

  • Akka: A free and open-source toolkit and runtime simplifying the construction of concurrent and distributed applications on the JVM.
  • Kubernetes: An open-source system for automating deployment, scaling, and management of containers.
  • Docker: Container building,shipping and running platform
  • sbt: The interactive build tool for Scala
  • Scala: A general-purpose programming language providing support for functional programming and a strong static type system.

Notes

The setup for this project was inspired by the following projects:

Steps

We can deploy and run on Google Kubernetes Engine (GKE) or locally!

Note that I have instructions for this using a Mac.

  1. Clone project somewhere.
    • git clone https://github.com/johnhainline/sclr.git
    • cd sclr
  2. Install the Kubernetes CLI.
    • Mac: brew cask install google-cloud-sdk which installs gcloud and other utilities. gcloud components install kubectl to get, kubectl.
  3. Install docker.
    • Mac: brew cask install docker
  4. Run docker.
  5. Build our base docker image:
    • cd src/main/resources/docker/; docker build -t local/openjdk-custom:latest .; cd ../../../../;
    • This builds the docker image referenced in our build.sbt as "local/openjdk-custom".
  6. Create a secret in kubectl for our MySQL password.
    • kubectl create secret generic mysql-password --from-literal=password=MYSQL_PASSWORD
  7. kubectl can point to the cloud, or to a local minikube instance.
    • kubectl config get-contexts and kubectl cluster-info

Build and Run Locally

  1. Install minikube, a locally running Kubernetes cluster.
    • Mac: brew cask install minikube
  2. Start minikube, enable DNS support, connect to docker, and open the dashboard.
    • minikube start
    • minikube addons enable kube-dns
    • eval $(minikube docker-env)
    • minikube dashboard
  3. Build project and push two docker images to our local docker install.
    • sbt manage/docker:publishLocal
    • sbt compute/docker:publishLocal

Build and Run on Google Kubernetes Engine

  1. See GKE Quickstart
  2. Get short-lived access to us.gcr.io, the Google Container Registry.
    • gcloud docker -a
  3. Build project and publish it to the Google Container Registry.
    • sbt manage/docker:publish
    • sbt compute/docker:publish
  4. Create a remote cluster for running our Kubernetes scripts on.
    • gcloud container clusters list
    • gcloud container clusters create sclr-01 --zone us-central1-a --num-nodes 1 --cluster-version=1.9.2-gke.1
    • gcloud container clusters get-credentials sclr-01
    • gcloud container clusters describe sclr-01

Common Deploy/Run commands

  1. Deploy using Kubernetes scripts.
    • cd src/main/resources/kubernetes/; kubectl create -f mysql.yaml; kubectl create -f compute-pods.yaml; kubectl create -f manage-pods.yaml; cd ../../../..;
  2. Check running pods, services, etc.
    • kubectl get all -o wide
  3. Send a single POST request (from the compute-0 pod) to the http-service endpoint. This kicks off the job.
    • kubectl exec -ti compute-0 -- curl -vH "Content-Type: application/json" -X POST -d '{"name":"m5000","dnfSize":2,"optionalSample":200,"useLPNorm":true,"mu":0.24}' http-service.default.svc.cluster.local:8080/begin
  4. Scale the compute nodes to 50
    • kubectl scale statefulsets compute --replicas=50
  5. Make a connection to the MySQL server.
    • kubectl run -it --rm --image=mysql:5.7 --restart=Never mysql-client -- mysql -h mysql-service -pMYSQL_PASSWORD
  6. Dump an entire schema from the MySQL server to our local directory.
    • kubectl exec -ti MYSQL_POD_NAME -- mysqldump --add-drop-database --databases medium -pMYSQL_PASSWORD > backup.sql
  7. Delete all local Kubernetes pods, including MySQL, etc.
    • kubectl delete pvc mysql-pv-claim; kubectl delete all -l app=sclr
    • Note DOUBLE CHECK EVERYTHING IS DOWN. On error things may keep running, costing money.

Benchmarks

We are using the sbt-jmh benchmarking framework. See sbt-jmh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •