Skip to content

The Benchmark

Jarrett Ye edited this page Aug 9, 2023 · 21 revisions

The experiment notebook: https://github.com/open-spaced-repetition/fsrs4anki/blob/benchmark/benchmark.ipynb

Dataset

59 collections submitted by users, 2936244 reviews in total.

Baselines

SM-2, Memrise, LSTM and FSRS.

Metrics

Log Loss, R-squared, Root-mean-square error (RMSE) and Mean absolute error (MAE).

Results

Model Log loss R-squared RMSE MAE
FSRS v4.5.1 0.37 0.73 4.0% 2.3%
LSTM 0.40 -0.58 6.3% 4.3%
FSRS v3.26.2 0.41 -1.76 7.0% 4.7%
SM-2 0.55 -29.55 18.5% 12.6%
Memrise 0.69 -51.50 18.0% 14.6%
  • Note that negative values of R-squared are not the result of a bug. R-squared can be negative in some cases.
  • The best results are highlighted in bold.
  • There were originally 66 collections. Two of them were so big they crashed Google Collab due to a lack of RAM, five were deemed outliers and therefore excluded.

Raw data: raw_data.xlsx

Acknowledge to @Expertium, who conduct the benchmark experiment.