From 61e2e46e5389047501ad96bbab915db906a11f49 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Fri, 27 Sep 2024 13:21:36 +0200 Subject: [PATCH 01/11] add: readme inital setup --- README.md | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 2ead563..ec1a1e8 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,69 @@ -# rustic_ml +
-A machine learning library created from scratch with Rust +# `rustic_ml` -> Create being worked on as of 25th of May, 2024. Not public yet! +## A machine learning library created from scratch + + +
+ +- [`rustic_ml`](#rustic_ml) + - [Summary](#summary) + - [Introduction](#introduction) + - [Highlights](#highlights) + - [Usage](#usage) + - [Use Cases](#use-cases) + - [Binary classification](#binary-classification) + - [Macros](#macros) + - [Feature Flags](#feature-flags) + - [Deeper Reading](#deeper-reading) + +## Summary + + +## Introduction + + + +## Highlights + + +## Usage + +.... + +First, depend on it in your Cargo manifest: + +... + +## Use Cases + +### Binary classification + + +## Macros + +## Feature Flags + + +## Deeper Reading + + + + +[crate_link]: https://crates.io/crates/rustic_ml "Crate listing" +[crate_img]: https://img.shields.io/crates/v/r.svg?style=for-the-badge&color=f46623 "Crate badge" +[docs_link]: https://docs.rs/rustic_ml/latest/rustic_ml "Crate documentation" +[docs_img]: https://img.shields.io/docsrs/rustic_ml/latest.svg?style=for-the-badge "Documentation badge" +[downloads_img]: https://img.shields.io/crates/dv/rustic_ml.svg?style=for-the-badge "Crate downloads" +[license_file]: https://github.com/KjetilIN/rustic_ml/blob/main/LICENSE "Project license" +[license_img]: https://img.shields.io/crates/l/bitvec.svg?style=for-the-badge "License badge" + + + + +[`deku`]: https://crates.io/crates/deku +[docsrs]: https://docs.rs/bitvec/latest/bitvec +[erl_bit]: https://www.erlang.org/doc/programming_examples/bit_syntax.html +[issue]: https://github.com/ferrilab/bitvec/issues/new +[`radium`]: https://crates.io/crates/radium \ No newline at end of file From 2c5691e5701870efebedfad02223f5268c898682 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Fri, 27 Sep 2024 13:25:57 +0200 Subject: [PATCH 02/11] readme: adjusted banner --- README.md | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index ec1a1e8..9a1d728 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,30 @@ -
+
+

rustic_ml

+ A machine learning library created from scratch + Created by Kjetil Indrehus +
-# `rustic_ml` +
+
+ version + Rust + + [![Documentation][docs_img]][docs_link] + [![Crate Downloads][downloads_img]][crate_link] +
-## A machine learning library created from scratch +
+ + +## Summary -
-- [`rustic_ml`](#rustic_ml) - - [Summary](#summary) + + +# Table of content: +

+ Rust + Build + version + Downloads + docs passing ci +

-
-
- version - Rust - - [![Documentation][docs_img]][docs_link] - [![Crate Downloads][downloads_img]][crate_link] -
- -
- ## Summary @@ -22,7 +22,7 @@ -# Table of content: - [Table of content: \ +

Rust - Build version Downloads +

+ +

Status

+

+ Build docs passing ci

@@ -24,15 +29,15 @@ ## Table of content -- [Table of content: \ - [Summary](#summary) -- [Introduction](#introduction) -- [Highlights](#highlights) +- [Feature list](#feature-list) - [Usage](#usage) - [Use Cases](#use-cases) - [Binary classification](#binary-classification) @@ -40,11 +41,7 @@ - [Deeper Reading](#deeper-reading) -## Introduction - - - -## Highlights +## Feature list ## Usage @@ -66,23 +63,3 @@ First, depend on it in your Cargo manifest: ## Deeper Reading - - - - -[crate_link]: https://crates.io/crates/rustic_ml "Crate listing" -[crate_img]: https://img.shields.io/crates/v/r.svg?style=for-the-badge&color=f46623 "Crate badge" -[docs_link]: https://docs.rs/rustic_ml/latest/rustic_ml "Crate documentation" -[docs_img]: https://img.shields.io/docsrs/rustic_ml/latest.svg?style=for-the-badge "Documentation badge" -[downloads_img]: https://img.shields.io/crates/dv/rustic_ml.svg?style=for-the-badge "Crate downloads" -[license_file]: https://github.com/KjetilIN/rustic_ml/blob/main/LICENSE "Project license" -[license_img]: https://img.shields.io/crates/l/bitvec.svg?style=for-the-badge "License badge" - - - - -[`deku`]: https://crates.io/crates/deku -[docsrs]: https://docs.rs/bitvec/latest/bitvec -[erl_bit]: https://www.erlang.org/doc/programming_examples/bit_syntax.html -[issue]: https://github.com/ferrilab/bitvec/issues/new -[`radium`]: https://crates.io/crates/radium From 432d90660f62467a97756a6e8812c716ab49fdad Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Fri, 27 Sep 2024 18:12:00 +0200 Subject: [PATCH 06/11] add: list of features and summary --- README.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f3053f9..cc4e472 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,8 @@ ## Summary -`rustic_ml` is a machine learning library designed to be easy to use, and give the developer enough flexibility. +`rustic_ml` is a machine learning library designed to be easy to use, and give the developer a flexible API to work with. +This library is built of first principles, and the goal is to avoid any dependencies. > ⚠️ This library is in the prototype stage. Breaking changes can happen. @@ -43,6 +44,10 @@ ## Feature list +The library includes the following key features: +- `Matrix` implementation +- `Dataframe` implementation +- `Perceptron` binary classifier ## Usage From 16ba5fb7ae5ea51fdb7b2948fafbb784d7d34642 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Thu, 3 Oct 2024 20:23:44 +0200 Subject: [PATCH 07/11] fix: binary classification demo --- examples/DataframeLab.ipynb | 194 --------------- examples/notebook_binary_classification.ipynb | 223 ++++++++++++++++++ ...me.ipynb => notebook_read_dataframe.ipynb} | 0 3 files changed, 223 insertions(+), 194 deletions(-) delete mode 100644 examples/DataframeLab.ipynb create mode 100644 examples/notebook_binary_classification.ipynb rename examples/{nootebook_read_dataframe.ipynb => notebook_read_dataframe.ipynb} (100%) diff --git a/examples/DataframeLab.ipynb b/examples/DataframeLab.ipynb deleted file mode 100644 index 01187c7..0000000 --- a/examples/DataframeLab.ipynb +++ /dev/null @@ -1,194 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "e237e057-796b-4e8a-89f8-076882ea2f9c", - "metadata": {}, - "source": [ - "# Using `rustic_ml` in a Jupyter notebook" - ] - }, - { - "cell_type": "markdown", - "id": "733b683a-a558-4596-96e2-6faca1e4c29a", - "metadata": {}, - "source": [ - "First step is to include the create to the notebook. \n", - "To get started see `README.md` on how to setup the notebook environment. \n", - "When it is installed, run `jupyter lab` to start the notebook in the browser.\n", - "Create a new notebook with the Rust option, and set the depencency to: \n", - "```rust\n", - ":dep rustic_ml = \"0.x.x\"\n", - "extern crate rustic_ml;\n", - "```\n", - "\n", - "After this, you will be able to use the libaries functionality in the following files.\n", - "\n", - "Since this is a example within the library iteself, we import the libary using the path:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "f541ad4f-9472-4cf6-a787-78a54389be91", - "metadata": { - "vscode": { - "languageId": "rust" - } - }, - "outputs": [], - "source": [ - ":dep rustic_ml = { path = \"../\" }\n", - "extern crate rustic_ml;" - ] - }, - { - "cell_type": "markdown", - "id": "67ec5c36-a929-4ee0-92bb-58cdcc7d5a5b", - "metadata": {}, - "source": [ - "Next, include the function that we are going to use from the library: " - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "3f2c42e6-7cf4-45b5-bdfb-1228b2bce10d", - "metadata": { - "vscode": { - "languageId": "rust" - } - }, - "outputs": [], - "source": [ - "use rustic_ml::data_utils::dataframe::Dataframe;" - ] - }, - { - "cell_type": "markdown", - "id": "76a9d147-f8ce-4bd5-9f12-dee698b0a942", - "metadata": {}, - "source": [ - "Reading a csv file:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d5d58a4d-8e12-44b8-9974-6408333cefc4", - "metadata": {}, - "outputs": [], - "source": [ - "let path = String::from(\"../datasets/european_cities.csv\");\n", - "let dataframe = Dataframe::from_csv(path).unwrap();" - ] - }, - { - "cell_type": "markdown", - "id": "152858c3-707d-4563-8e69-5f2681797399", - "metadata": {}, - "source": [ - "Run the following codeblock to see the information about the dataframe:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "11070ca7-ffcc-41bd-851e-e67c7fb89d48", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Column Name Type None Some Total Length \n", - "-----------------------------------------------------------------\n", - "Barcelona Float 0 24 24 \n", - "Belgrade Float 0 24 24 \n", - "Berlin Float 0 24 24 \n", - "Brussels Float 0 24 24 \n", - "Bucharest Float 0 24 24 \n", - "Budapest Float 0 24 24 \n", - "Copenhagen Float 0 24 24 \n", - "Dublin Float 0 24 24 \n", - "Hamburg Float 0 24 24 \n", - "Istanbul Float 0 24 24 \n", - "Kyiv Float 0 24 24 \n", - "London Float 0 24 24 \n", - "Madrid Float 0 24 24 \n", - "Milan Float 0 24 24 \n", - "Moscow Float 0 24 24 \n", - "Munich Float 0 24 24 \n", - "Paris Float 0 24 24 \n", - "Prague Float 0 24 24 \n", - "Rome Float 0 24 24 \n", - "Saint Petersburg Float 0 24 24 \n", - "Sofia Float 0 24 24 \n", - "Stockholm Float 0 24 24 \n", - "Vienna Float 0 24 24 \n", - "Warsaw Float 0 24 24 \n" - ] - }, - { - "data": { - "text/plain": [ - "()" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dataframe.info()" - ] - }, - { - "cell_type": "markdown", - "id": "546e0b63-3577-456f-8a1f-83d306f0e7af", - "metadata": {}, - "source": [ - "To see the memory usage of the dataframe, we call `memory_usage()`:" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "63ed09c7-6779-4fa0-a4c4-fb77d421f43a", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "4608" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dataframe.memory_usage()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Rust", - "language": "rust", - "name": "rust" - }, - "language_info": { - "codemirror_mode": "rust", - "file_extension": ".rs", - "mimetype": "text/rust", - "name": "Rust", - "pygment_lexer": "rust", - "version": "" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/examples/notebook_binary_classification.ipynb b/examples/notebook_binary_classification.ipynb new file mode 100644 index 0000000..79de79e --- /dev/null +++ b/examples/notebook_binary_classification.ipynb @@ -0,0 +1,223 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "ff34b956-c739-4514-b628-c1bcf4338b08", + "metadata": {}, + "source": [ + "# Perceptron Binary Classification\n", + "\n", + "Lets imagine we have the following data:\n", + "\n", + "| Movie # | Alice | Bob | Profitable? |\n", + "|---------|-------|-----|-------------|\n", + "| 1 | 1 | 1 | no |\n", + "| 2 | 4 | 3 | yes |\n", + "| 3 | 3 | 5 | yes |\n", + "| 4 | 5 | 6 | yes |\n", + "| 5 | 2 | 3 | no |\n", + "\n", + "\n", + "Our goal is to classify a profitable movie, based on two critics score.\n", + "The score goes from 1-6.\n", + "\n", + "With `rustic_ml`, we can train on the dataset by using a `Perceptron`:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "1c4a10dd-a25b-4a7f-beee-d925629ec6ae", + "metadata": {}, + "outputs": [], + "source": [ + "// See readme documentation for how to setup a Jupyter notebook with Rust and rustic_ml\n", + ":dep rustic_ml = { path = \"../\" }\n", + "extern crate rustic_ml;\n", + "\n", + "// Import the perceptron\n", + "use rustic_ml::perceptron::Perceptron;" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "583c2849-2109-4cd2-b478-52e460ff0f46", + "metadata": {}, + "outputs": [], + "source": [ + "//Initialize the perceptron with a learning rate of 0.1\n", + "let mut perceptron = Perceptron::init().learning_rate(1.0).bias(-1.0);" + ] + }, + { + "cell_type": "markdown", + "id": "34be0e90-0977-4154-ac4f-2ce8e337cfb9", + "metadata": {}, + "source": [ + "For larger datasets, we could use the `Dataframe` struct, but for simplicity, using vector:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "cd5deeb4-e15e-47ed-80ed-ff8338e67d7c", + "metadata": {}, + "outputs": [], + "source": [ + "let x_train: Vec<(f64, f64)> = vec![(1.0, 1.0), (4.0, 3.0), (3.0, 5.0), (5.0, 6.0), (2.0, 3.0)];\n", + "let y_train = vec![0.0, 1.0, 1.0, 1.0, 0.0];" + ] + }, + { + "cell_type": "markdown", + "id": "3aa421ed-3590-440c-a666-e25b8f92600b", + "metadata": {}, + "source": [ + "Training until it learns the system. This is something we can do since we know the data is linearly separable.\n", + "We can also use `fit_until_halt` if we don't want to log each epoch:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8b0ad3e2-9b3e-498e-8f4a-f532c7004fd4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1: 80% accuracy\n", + "Epoch 2: 60% accuracy\n", + "Epoch 3: 60% accuracy\n", + "Epoch 4: 60% accuracy\n", + "Epoch 5: 40% accuracy\n", + "Epoch 6: 80% accuracy\n", + "Epoch 7: 100% accuracy\n" + ] + } + ], + "source": [ + "perceptron.fit_until_halt_with_logging(&x_train, &y_train);" + ] + }, + { + "cell_type": "markdown", + "id": "cd4009f4-846e-4e9a-9c77-f936735638cc", + "metadata": {}, + "source": [ + "Calculate the accuracy over the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "a04a97e7-5d2b-49fc-8488-2e782b878328", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Model accuracy: 100%\n" + ] + } + ], + "source": [ + "let accuracy = perceptron.calculate_accuracy(&x_train, &y_train);\n", + "println!(\"Model accuracy: {}%\", accuracy);" + ] + }, + { + "cell_type": "markdown", + "id": "4de0b106-48d5-412b-8b39-439cbceb575f", + "metadata": {}, + "source": [ + "To find out what the model final weights and biases was:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "cbed4d1a-92b5-4a69-8fb9-59ea05a15b9e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " |Perceptron\n", + "----------------------------------------\n", + " Bias | -4\n", + "----------------------------------------\n", + " W1 | 2.9568098897727078\n", + "----------------------------------------\n", + " W2 | -0.608317931904085\n" + ] + } + ], + "source": [ + "perceptron.print_model();" + ] + }, + { + "cell_type": "markdown", + "id": "b6bb760d-5848-46d2-851f-d7b4524938ac", + "metadata": {}, + "source": [ + "Using the model to predict. \n", + "Alice gave the score 4, and Bob gave the score 2: " + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "fc7737e8-51ce-44b2-8cd7-13120efbdd09", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Critics judge: Will movie be profitable? yes\n" + ] + } + ], + "source": [ + "let profitable = if perceptron.predict(&(4.0, 3.0)) == 1{\n", + " \"yes\"\n", + " }else{\n", + " \"no\"\n", + " };\n", + "\n", + "println!(\"Critics judge: Will movie be profitable? {}\", profitable);" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b746dcb7-59ac-44a1-992c-3b6372dd6e68", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Rust", + "language": "rust", + "name": "rust" + }, + "language_info": { + "codemirror_mode": "rust", + "file_extension": ".rs", + "mimetype": "text/rust", + "name": "Rust", + "pygment_lexer": "rust", + "version": "" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/nootebook_read_dataframe.ipynb b/examples/notebook_read_dataframe.ipynb similarity index 100% rename from examples/nootebook_read_dataframe.ipynb rename to examples/notebook_read_dataframe.ipynb From dd09a15d5067a89fbe92220974f41a2b01717f74 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus <66110094+KjetilIN@users.noreply.github.com> Date: Thu, 3 Oct 2024 20:42:18 +0200 Subject: [PATCH 08/11] add: use cases and basic setup of cargo setup --- README.md | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index cc4e472..4df0fb4 100644 --- a/README.md +++ b/README.md @@ -51,20 +51,42 @@ The library includes the following key features: ## Usage -.... - -First, depend on it in your Cargo manifest: - -... +`rustic_ml` has documentation on docs.rs. It will be very usefull to read it through +https://docs.rs/rustic_ml/latest/rustic_ml/ + +Run the following Cargo command in your project directory: +``` +cargo add rustic_ml +``` +Or add it to the Cargo manifest. Make sure to pick the newest version: + +```toml +[dependencies] +rustic_ml = "0.0.2" +``` +Also see the [./examples/](examples/) folder for different examples. +See also the specific use cases in the next section of the README file. ## Use Cases ### Binary classification +`rustic_ml` has implemented the `Percetpron`. It works well when you know your data is linearly seperable. +In the example below, we use a Jupyter Notebook with Rust kernal. This makes it easy to build up models with Rust: + +![image](https://github.com/user-attachments/assets/29ef6f0c-ab6f-46f9-bc2b-1b748c34e039) + +(See the full demo [examples/notebook_binary_classification.ipynb](examples/notebook_binary_classification.ipynb) + ## Macros +> Comming soon! + ## Feature Flags +> Comming soon! ## Deeper Reading + +> Comming soon! From ddee4e9f66fab90af8a2c78e86f42263336dca63 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Thu, 3 Oct 2024 20:46:51 +0200 Subject: [PATCH 09/11] fix: doc test for readme folder --- README.md | 14 +++++++------- src/lib.rs | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 4df0fb4..c670ee9 100644 --- a/README.md +++ b/README.md @@ -51,11 +51,11 @@ The library includes the following key features: ## Usage -`rustic_ml` has documentation on docs.rs. It will be very usefull to read it through +`rustic_ml` has documentation on docs.rs. It will be very useful to read it through https://docs.rs/rustic_ml/latest/rustic_ml/ Run the following Cargo command in your project directory: -``` +```terminal cargo add rustic_ml ``` Or add it to the Cargo manifest. Make sure to pick the newest version: @@ -71,8 +71,8 @@ See also the specific use cases in the next section of the README file. ### Binary classification -`rustic_ml` has implemented the `Percetpron`. It works well when you know your data is linearly seperable. -In the example below, we use a Jupyter Notebook with Rust kernal. This makes it easy to build up models with Rust: +`rustic_ml` has implemented the `Perceptron`. It works well when you know your data is linearly separable. +In the example below, we use a Jupyter Notebook with Rust kernel. This makes it easy to build up models with Rust: ![image](https://github.com/user-attachments/assets/29ef6f0c-ab6f-46f9-bc2b-1b748c34e039) @@ -81,12 +81,12 @@ In the example below, we use a Jupyter Notebook with Rust kernal. This makes it ## Macros -> Comming soon! +> Coming soon! ## Feature Flags -> Comming soon! +> Coming soon! ## Deeper Reading -> Comming soon! +> Coming soon! diff --git a/src/lib.rs b/src/lib.rs index c14e1f6..9e00d58 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -2,4 +2,4 @@ pub mod activation; pub mod data_utils; -pub mod perceptron; +pub mod perceptron; \ No newline at end of file From c44e7d20bfb1e4a9a0979d34678dae3c7c0babd6 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Thu, 3 Oct 2024 20:47:20 +0200 Subject: [PATCH 10/11] add: increment version number --- Cargo.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Cargo.toml b/Cargo.toml index 18fb644..0625194 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "rustic_ml" -version = "0.0.2" +version = "0.0.3" authors = ["Kjetil Indrehus "] edition = "2021" license = "MIT" From 77df4a8132526ddcce089b3ec27b7830e7517202 Mon Sep 17 00:00:00 2001 From: Kjetil Indrehus Date: Thu, 3 Oct 2024 20:48:17 +0200 Subject: [PATCH 11/11] fix: format lib.rs --- src/lib.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lib.rs b/src/lib.rs index 9e00d58..c14e1f6 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -2,4 +2,4 @@ pub mod activation; pub mod data_utils; -pub mod perceptron; \ No newline at end of file +pub mod perceptron;