README.qmd

---
format: 
    gfm:
        default-image-extension: ""
always_allow_html: true
execute: 
  cache: true
  freeze: auto
---
<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r}
#| include: false

# load {SLmetrics}
library(SLmetrics)

knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.path   = "man/figures/README-",
  out.width  = "100%",
  dpi        = 1280,
  fig.height = 6
)

# 1) store data
DT <- SLmetrics:::DT

# 2) set colors of the lattice
lattice::trellis.par.set(
  modifyList(
    lattice::standard.theme(), 
    list(
      background = list(col = "#212830"), # was  
      panel.background = list(col = "#2b3139"),
       reference.line = list(col = "#4a5563", lty = 1), # #4a5563
      axis.text = list(col = "#848e9c"),         
      par.xlab.text = list(col = "#848e9c"),    
      par.ylab.text = list(col = "#848e9c"),     
      par.main.text = list(col = "#848e9c"),
      axis.line = list(col = "#848e9c"), 
      superpose.line = list(col = c("#5d8ca8", "#d5695d", "#65a479","#d3ba68")),
      superpose.symbol = list(col = c("#5d8ca8", "#d5695d", "#65a479","#d3ba68"), pch = 16)
    )
  )
)
```

# {SLmetrics}: Machine learning performance evaluation on steroids <img src="man/figures/logo.png" align="right" height="150" alt="" />

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/SLmetrics)](https://CRAN.R-project.org/package=SLmetrics)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/last-month/SLmetrics?color=blue)](https://r-pkg.org/pkg/SLmetrics)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![R-CMD-check](https://github.com/serkor1/SLmetrics/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/serkor1/SLmetrics/actions/workflows/R-CMD-check.yaml)
[![macOS-clang](https://github.com/serkor1/SLmetrics/actions/workflows/macos-check-clang.yaml/badge.svg)](https://github.com/serkor1/SLmetrics/actions/workflows/macos-check-clang.yaml)
[![codecov](https://codecov.io/gh/serkor1/SLmetrics/branch/development/graph/badge.svg?token=X2osJDSRlN)](https://app.codecov.io/gh/serkor1/SLmetrics)
[![CodeFactor](https://www.codefactor.io/repository/github/serkor1/slmetrics/badge)](https://www.codefactor.io/repository/github/serkor1/slmetrics)
<!-- badges: end -->

[{SLmetrics}](https://serkor1.github.io/SLmetrics/) is a lightweight `R` package written in `C++` and  [{Rcpp}](https://github.com/RcppCore/Rcpp) for *memory-efficient* and *lightning-fast* machine learning performance evaluation; it's like using a supercharged [{yardstick}](https://github.com/tidymodels/yardstick) but without the risk of soft to super-hard deprecations. [{SLmetrics}](https://serkor1.github.io/SLmetrics/) covers both regression and classification metrics and provides (almost) the same array of metrics as [{scikit-learn}](https://github.com/scikit-learn/scikit-learn) and [{PyTorch}](https://github.com/pytorch/pytorch) all without [{reticulate}](https://github.com/rstudio/reticulate) and the Python compile-run-(crash)-debug cylce.

Depending on the mood and alignment of planets [{SLmetrics}](https://serkor1.github.io/SLmetrics/) stands for Supervised Learning metrics, or Statistical Learning metrics. If [{SLmetrics}](https://serkor1.github.io/SLmetrics/) catches on, the latter will be the core philosophy and include unsupervised learning metrics. If not, then it will remain a {pkg} for Supervised Learning metrics, and a sandbox for me to develop my `C++` skills.

## :books: Table of Contents

* [:rocket: Gettting Started](#rocket-gettting-started)
  + [:shield: Installation](#shield-installation)
  + [:books: Basic Usage](#books-basic-usage)
* [:information_source: Why?](#information_source-why)
* [:zap: Performance Comparison](#zap-performance-comparison)
  + [:fast_forward: Speed comparison](#fast_forward-speed-comparison)
  + [:floppy_disk: Memory-efficiency](#floppy_disk-memory-efficiency)
* [:information_source: Basic usage](#information_source-basic-usage)
  + [:books: Regression](#books-regression)
  + [:books: Classification](#books-classification)
* [:information_source: Enable OpenMP](#information_source-enable-openmp)
  + [:books: Entropy without OpenMP](#books-entropy-without-openmp)
  + [:books: Entropy with OpenMP](#books-entropy-with-openmp)
* [:information_source: Installation](#information_source-installation)
  + [:shield: Stable version](#shield-stable-version)
  + [:hammer_and_wrench: Development version](#hammer_and_wrench-development-version)
* [:information_source: Code of Conduct](#information_source-code-of-conduct)


## :rocket: Gettting Started

Below you’ll find instructions to install [{SLmetrics}](https://serkor1.github.io/SLmetrics/) and get started with your first metric, the Root Mean Squared Error (RMSE).

### :shield: Installation 

```{r}
#| eval: false

## install stable release
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics@*release',
  ref  = 'main'
)
```

### :books: Basic Usage

Below is a minimal example demonstrating how to compute both unweighted and weighted RMSE.

```{r}
#| message: false
library(SLmetrics)

actual    <- c(10.2, 12.5, 14.1)
predicted <- c(9.8, 11.5, 14.2)
weights   <- c(0.2, 0.5, 0.3)

cat(
  "Root Mean Squared Error", rmse(
    actual    = actual,
    predicted = predicted,
  ),
  "Root Mean Squared Error (weighted)", weighted.rmse(
    actual    = actual,
    predicted = predicted,
    w         = weights
  ),
  sep = "\n"
)
```

That’s all! Now you can explore the rest of this README for in-depth usage, performance comparisons, and more details about [{SLmetrics}](https://serkor1.github.io/SLmetrics/).

## :information_source: Why?

Machine learning can be a complicated task; the steps from feature engineering to model deployment require carefully measured actions and decisions. One low-hanging fruit to simplify this process is *performance evaluation*. 

At its core, performance evaluation is essentially just comparing two vectors — a programmatically and, at times, mathematically trivial step in the machine learning pipeline, but one that can become complicated due to:

1. Dependencies and potential deprecations
2. Needlessly complex or repetitive arguments  
3. Performance and memory bottlenecks at scale  

[{SLmetrics}](https://serkor1.github.io/SLmetrics/) solves these issues by being:

1. **Fast:** Powered by `C++` and [Rcpp](https://github.com/RcppCore/Rcpp)  
2. **Memory-efficient:** Everything is structured around pointers and references
3. **Lightweight:** Only depends on [Rcpp](https://github.com/RcppCore/Rcpp), [RcppEigen](https://github.com/RcppCore/RcppEigen), and [lattice](https://github.com/deepayan/lattice)
4. **Simple:** S3-based, minimal overhead, and flexible inputs

Performance evaluation should be plug-and-play and “just work” out of the box — there’s no need to worry about *quasiquations*, *dependencies*, *deprecations*, or variations of the same functions relative to their arguments when using [{SLmetrics}](https://serkor1.github.io/SLmetrics/).

## :zap: Performance Comparison

One, obviously, can't build an `R`-package on `C++` and [{Rcpp}](https://github.com/RcppCore/Rcpp) without a proper pissing contest at the urinals - below is a comparison in execution time and memory efficiency of two simple cases  that any {pkg} should be able to handle gracefully; computing a 2 x 2 confusion matrix and computing the RMSE[^1].

### :fast_forward: Speed comparison

```{r}
#| echo: false

lattice::xyplot(
  median ~ sample_size | measure,
  data = do.call(rbind, DT$speed),
  groups = expr,
  type = 'l', 
  auto.key = list(columns = 2, col = "#848e9c"),
  scales = list(
    y = list(log = FALSE, relation = "free"),
    x = list(log = FALSE)
    ),
  xlab = "Sample Size (N)",
  ylab = "Median Execution Time (Microseconds)",
  panel = function(...) {
    lattice::panel.grid(...)
    lattice::panel.xyplot(...)
  }
)
```

As shown in the chart, [{SLmetrics}](https://serkor1.github.io/SLmetrics/) maintains consistently low(er) execution times across different sample sizes.

### :floppy_disk: Memory-efficiency

Below are the results for garbage collections and total memory allocations when computing a 2×2 confusion matrix (N = 1e7) and RMSE (N = 1e7) [^2]. Notice that [{SLmetrics}](https://serkor1.github.io/SLmetrics/) requires no GC calls for these operations.

```{r}
#| echo: false

# 1) prepare data
packages <- c("{SLmetrics}", "{yardstick}", "{MLmetrics}", "{mlr3measures}")
measures <- c("Confusion Matrix", "Root Mean Squared Error")
column_names <- c("Iterations", "Garbage Collections [gc()]", "gc() pr. second", "Memory Allocation (MB)")
```

```{r}
#| echo: false
# 1) extract data;
DT_ <- DT$memory[[1]][,c("n_itr", "n_gc", "gc/sec", "mem_alloc")]
DT_$mem_alloc <- round(DT_$mem_alloc/(1024^2))
colnames(DT_) <- column_names
rownames(DT_) <- packages

knitr::kable(DT_, caption = "2 x 2 Confusion Matrix (N = 1e7)", digits = 2)
```

```{r}
#| echo: false

# 1) extract data;
DT_ <- DT$memory[[2]][,c("n_itr", "n_gc", "gc/sec", "mem_alloc")]
DT_$mem_alloc <- round(DT_$mem_alloc/(1024^2))
colnames(DT_) <- column_names
rownames(DT_) <- packages

knitr::kable(DT_, caption = "RMSE (N = 1e7)", digits = 2)
```

In both tasks, [{SLmetrics}](https://serkor1.github.io/SLmetrics/) remains extremely memory-efficient, even at large sample sizes.

> [!IMPORTANT]  
>
> From [{bench}](https://github.com/r-lib/bench) documentation: *Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. by `malloc()` or new directly is not tracked, take care to avoid misinterpreting the results if running code that may do this.*

## :information_source: Basic usage

In its simplest form, [{SLmetrics}](https://serkor1.github.io/SLmetrics/)-functions work directly with pairs of \<numeric\> vectors (for regression) or \<factor\> vectors (for classification). Below we demonstrate this on two well-known datasets, `mtcars` (regression) and `iris` (classification).

### :books: Regression

We first fit a linear model to predict `mpg` in the `mtcars` dataset, then compute the in-sample RMSE:

```{r}
# Evaluate a linear model on mpg (mtcars)
model <- lm(mpg ~ ., data = mtcars)
rmse(mtcars$mpg, fitted(model))
```

### :books: Classification

Now we recode the `iris` dataset into a binary problem ("virginica" vs. "others") and fit a logistic regression. Then we generate predicted classes, compute the confusion matrix and summarize it.

```{r}
# 1) recode iris
# to binary problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

# 2) fit the logistic
# regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

# 3) generate predicted
# classes
predicted <- factor(
  as.numeric(
    predict(model, type = "response") > 0.5
  ),
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

# 4) generate actual
# values as factor
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)
```

```{r}
# 4) generate
# confusion matrix
summary(
  confusion_matrix <-  cmatrix(
    actual    = actual,
    predicted = predicted
  )
)
```

## :information_source: Enable OpenMP
```{r}
#| echo: false

# Set column names for both
# examples
column_names <- c("Iterations", "Runtime (sec)" ,"Garbage Collections [gc()]", "gc() pr. second", "Memory Allocation (MB)")
```

> [!IMPORTANT]  
>
> OpenMP support in [{SLmetrics}](https://serkor1.github.io/SLmetrics/) is experimental. Use it with caution, as performance gains and stability may vary based on your system configuration and workload.

You can control OpenMP usage within [{SLmetrics}](https://serkor1.github.io/SLmetrics/) using the setUseOpenMP function. Below are examples demonstrating how to enable and disable OpenMP:

```{r}
# enable OpenMP
SLmetrics::setUseOpenMP(TRUE)

# disable OpenMP
SLmetrics::setUseOpenMP(FALSE)
```

To illustrate the impact of OpenMP on performance, consider the following benchmarks for calculating entropy on a 1,000,000 x 200 matrix over 100 iterations[^3].

### :books: Entropy without OpenMP

```{r}
#| echo: false

# 1) extract data;
DT_ <- DT$OpenMP$FALSE_[,c("n_itr", "median" , "n_gc", "gc/sec", "mem_alloc")]
DT_$mem_alloc <- round(DT_$mem_alloc/(1024^2))
DT_$median <- round(DT_$median, 2)
colnames(DT_) <- column_names

knitr::kable(DT_, caption = "1e6 x 200 matrix without OpenMP", digits = 2)
```

### :books: Entropy with OpenMP

```{r}
#| echo: false

# 1) extract data;
DT_ <- DT$OpenMP$TRUE_[,c("n_itr", "median" , "n_gc", "gc/sec", "mem_alloc")]
DT_$mem_alloc <- round(DT_$mem_alloc/(1024^2))
DT_$median <- round(DT_$median, 2)
colnames(DT_) <- column_names

knitr::kable(DT_, caption = "1e6 x 200 matrix with OpenMP", digits = 2)
```


### :shield: Stable version
```{r}
#| echo: TRUE
#| eval: FALSE

## install stable release
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics@*release',
  ref  = 'main'
)
```

### :hammer_and_wrench: Development version
```{r}
#| echo: TRUE
#| eval: FALSE

## install development version
devtools::install_github(
  repo = 'https://github.com/serkor1/SLmetrics',
  ref  = 'development'
)
```

## :information_source: Code of Conduct

Please note that the [{SLmetrics}](https://serkor1.github.io/SLmetrics/) project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/1/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

[^1]: The source code is available [here](https://github.com/serkor1/SLmetrics/blob/development/data-raw/classification_performance.R) and [here](https://github.com/serkor1/SLmetrics/blob/development/data-raw/regression_performance.R).
[^2]: The source code is available [here](https://github.com/serkor1/SLmetrics/blob/development/data-raw/memory_performance.R).
[^3]: The source code is available [here](https://github.com/serkor1/SLmetrics/blob/development/data-raw/OpenMP_performance.R).