A simple project to experiment with exposing an ML-model as a REST API.
Determine (classify) a given organization's industrial code based on 'formaal' as given by user.
The task at hand is considered a task of text classification (aka categorization). As such we are going to represent the formaal as a feature vector X. We must consider whether to apply a process of feature selection to speed up the classification.
What type of machine learning system are we looking at? In the following we will try to describe our system along three categories:
- k-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural networks
- Naive Bayes
For an extensive list of supervised learning algorithms supported by scikit-learn, check https://scikit-learn.org/stable/supervised_learning.html
-
Clustering
-
Visualization and dimensionality reduction
-
Association rule learning
- Apriori
- Eclat
For an extensive list of unsupervised learning algorithms supported by scikit-learn, check https://scikit-learn.org/stable/unsupervised_learning.html
- Describe our choice
- Describe our choice
The final category adresses how a machine learning system generalize. There are to main approaches to generalization:
- Instance-based learning: the system learns examples by heart, then generalize to new cases using a similarity measure, vs
- Model-based learning: to build a model of the examples, then the system use that model to make predictions.
- Describe our choice
- Batch train a model on data
- Expose model as API
- Monitor and gather metrics
- Evaluate performance
- Update model
Our ML pipeline looks like this: credits: Emily Fox & Carlos Guestrin
The solution is a simple python project that implements a script for training a model as well as a server that exposes the api.
-- helloMLAPI
-- data # contains the data
-- organizations.csv
-- models # contains the models trained on the data
-- xyz.py
-- xyz.pckl # a persisted model
-- api
-- server.py # the api
-- Dockerfile
-- test
-- test.py
README.md
- Flask (Python micro web framework)
- sklearn (scikit-learn)
- pickle (Python object serialization), or
- joblib (Scikit learn model persistence)
To start the server locally:
cd api
python server.py
TODO: Describe how to build and start the api as a dockerized service
Once the server is running, you can use it with e.g. curl:
curl \
--include \
--header "Content-Type: application/json" \
--request POST \
--data '{"formaal":"Turer i skog og mark"}' \
--url http://localhost:5000 \
--write-out "\n"
Response should include a list of industrial codes and descriptions that matches the formaal. The list will be sorted best match first.
- http://blog.socratesk.com/blog/2018/01/29/expose-ML-model-as-REST-API
- https://towardsdatascience.com/a-flask-api-for-serving-scikit-learn-models-c8bcdaa41daa
- Russel and Norvig (2010): Artificial Intelligence. A Modern Approach. Third Edition.
- Aurélen Géron (2017): Hands-On Machine Learning with Scikit-Learn & Tensorflow