FSRS like to optimize parameter 7 to 0.0 #695

Gilfaro · 2024-10-06T10:09:27Z

FSRS 5 even more than 4.5 likes to optimize parameter 7 which also disables difficulty decay to 0.
Even manually changing the parameter after optimization results in better log loss and rmse(bins).
Probably needs more testing in benchmark, but setting minimum at about 0.0100 would fix this issue and improve the fit to data.

L-M-Sherlock · 2024-10-06T10:40:59Z

Even manually changing the parameter after optimization results in better log loss and rmse(bins).

Could you reproduce it in 50%+ cases?

Expertium · 2024-10-06T10:59:31Z

I think it's better to benchmark it on Anki 20k.

Gilfaro · 2024-10-06T11:34:17Z

Currently tested on 5 decks in Anki beta 2 and it improved it for all my decks.
Here is sample run from small benchmark where numbers are nearly the same, but from end user perspective it is much better if difficulty can be modified by using good/again.
Sometimes values much higher than clipper are more optimal, but once the optimizer decides on 0 it seems to be stuck in that local minimum.

Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387

parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]

Model: FSRS-5-clamp7
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3738±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0499±0.0185
FSRS-5 AUC (mean±std): 0.7062±0.0898

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3658±0.1511
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0243
FSRS-5 AUC (mean±std): 0.6742±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1545
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0245
FSRS-5 AUC (mean±std): 0.6669±0.1387

parameters: [0.3195, 1.04455, 2.5806, 15.4029, 7.208, 0.47475, 1.03605, 0.0252, 1.62875, 0.22925, 1.02125, 1.9363, 0.07145, 0.2694, 2.2139, 0.3364, 3.18255, 0.61735, 0.73345]

Expertium · 2024-10-06T12:08:13Z

@L-M-Sherlock we've talked about this before, and ran these kinds of benchmarks before. The conclusion here is the same - clamping barely affects the metrics. So it just comes down to preferences - do we want D to always change if the user pressed Good, even if it's a small change? I'd say yes.

user1823 · 2024-10-08T11:20:50Z

In principle, I like the suggestion, but experience suggests otherwise.

In 20d2dae, L-M-Sherlock used 0.05 as the minimum value for this parameter. (It was called w[5] at that time.)

But, using 0.05 as the minimum limit with my collection, not only made RMSE worse, but also increased the workload (parameters calculated with the new limit on exactly the same collection gave me a backlog of 900 extra cards). First reported in #342 (comment)

L-M-Sherlock gave this explanation for the issue:

One rational explanation for your case is, you don't have ease hell, but w[5] assume you have. So w[5] will decrease the difficulty in the long-term. Then w[4] would increase to counteract or even override it, which induces the workload.

To fix the issue, he decreased the lower limit back to 0.

I advised using a small but non-zero lower limit (such as 0.0003)

His response was that such a low value won't result in any appreciable mean reversion and, thus, is no better than using 0.

The difference between 0.0003 and 0 is pretty small. If the initial difficulty is 5 and the current difficulty is 10. If you always press Good, here is the subsequent difficulty:

5 * 0.0003 + 10 * (1-0.0003) = 9.9985
5 * 0.0003 + 9.9985 * (1-0.0003) = 9.997
5 * 0.0003 + 9.997 * (1-0.0003) = 9.9955
...

Gilfaro · 2024-10-08T17:21:25Z

Value 0.05 is way too high as in my N=1 case and small benchmark value of about 0.01 or close to it makes the difficulty scale and either improves RMSE or barely changes it.
Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%

Expertium · 2024-10-08T17:24:22Z

it improves all metrics by a lot in best case 20%

That's extremely weird, considering that the benchmarks show that the difference in RMSE between clamped w7 and unclamped is <1%

brishtibheja · 2024-10-08T17:54:44Z

My noob idea is do it both clamped and unclamped when that parameter is 0.0 and keep the params that are better but of course, you can deliberate on a better solution.

Gilfaro · 2024-10-08T18:26:54Z

@Expertium
It is different as in benchmark case the clamping occurs on batch basis while in my manual case the clamping is done only at the end.

Expertium · 2024-10-08T18:36:04Z

I don't know what you mean

L-M-Sherlock · 2024-10-18T10:19:27Z

Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%

It's likely an extreme case. According to the benchmark, there are 50% users whose w[7] is less than 0.02. And 0 is the mode value of w[7]:

user1823 · 2024-10-18T11:03:17Z

And 0 is the mode value of w[7]

That's what the OP is also saying.

We want to find out the value that is the most optimal, not the value that the optimizer is most likely to produce.

So, if using 0.01 (or maybe 0.005) as the lower limit of w[7] improves the RMSE for most users, we can consider that.

L-M-Sherlock · 2024-10-21T10:06:09Z

I have tried w[7] >= 0.01 and w[7] >= 0.005, they both increased RMSE. So I decide to use 0.001.

Gilfaro · 2024-10-21T11:34:57Z

if you mostly want lowest RMSE and trust the gradient descent you can remove the clipper completely and you will improve all metrics by a bit, then w[7] most optimal value is actually negative
analyzing this, means that the algo finds that all cards should approach 100% difficulty for most of the users and either the parameter for difficulty is useless or there is some core problem in the modeling of data

Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387

parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]

Model: FSRS-5-noclipper
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3728±0.1170
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0173
FSRS-5 AUC (mean±std): 0.7109±0.0889

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3657±0.1504
FSRS-5 RMSE(bins) (mean±std): 0.0649±0.0241
FSRS-5 AUC (mean±std): 0.6757±0.1360

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1539
FSRS-5 RMSE(bins) (mean±std): 0.0673±0.0244
FSRS-5 AUC (mean±std): 0.6680±0.1398

parameters: [0.33585, 1.0276, 2.8278, 15.4026, 7.1837, 0.482, 0.92045, -0.0653, 1.64665, 0.1899, 1.11895, 1.9508, 0.11625, 0.2786, 2.11275, 0.33215, 3.1705, 0.6295, 0.7321]

L-M-Sherlock · 2024-10-21T12:01:22Z

the parameter for difficulty is useless

It's useful because the RMSE will increase significantly if I remove difficulty from the formula.

Gilfaro · 2024-10-21T12:09:05Z

I tried the newest optimizer with linear damping and it shifts the most optimal value of w7 even more into the negatives

Model: FSRS-5-noclipper-lineardamping
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3724±0.1170
FSRS-5 RMSE(bins) (mean±std): 0.0485±0.0173
FSRS-5 AUC (mean±std): 0.7106±0.0888

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3650±0.1503
FSRS-5 RMSE(bins) (mean±std): 0.0642±0.0246
FSRS-5 AUC (mean±std): 0.6768±0.1340

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3632±0.1538
FSRS-5 RMSE(bins) (mean±std): 0.0668±0.0249
FSRS-5 AUC (mean±std): 0.6692±0.1375

parameters: [0.31775, 1.0013, 2.8053, 15.4017, 7.2231, 0.46875, 1.2435, -0.11325, 1.59345, 0.1909, 1.063, 1.94885, 0.11515, 0.27055, 2.20005, 0.3062, 3.1973, 0.6309, 0.7335]

user1823 · 2024-10-21T12:37:39Z

you can remove the clipper completely and you will improve all metrics by a bit

But, your results show that the RMSE becomes worse when the clipper is removed. This brings me to the second point.

trust the gradient descent

For such a complex model, we can't trust gradient descent fully. Finding the global minimum is probably not possible. So, we need to implement some rules to ensure that the results are as expected. The parameter clipper is one of those.

L-M-Sherlock · 2024-10-22T10:48:28Z

For discussion about the difficulty variable and formula, please continue in #352 or open a new issue to propose your idea.

Gilfaro added the bug Something isn't working label Oct 6, 2024

L-M-Sherlock mentioned this issue Oct 18, 2024

Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

Closed

L-M-Sherlock mentioned this issue Oct 21, 2024

Feat/linear damping && w[7] >= 0.001 open-spaced-repetition/fsrs-optimizer#143

Merged

L-M-Sherlock closed this as completed in open-spaced-repetition/fsrs-optimizer#143 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FSRS like to optimize parameter 7 to 0.0 #695

FSRS like to optimize parameter 7 to 0.0 #695

Gilfaro commented Oct 6, 2024

L-M-Sherlock commented Oct 6, 2024

Expertium commented Oct 6, 2024

Gilfaro commented Oct 6, 2024 •

edited

Loading

Expertium commented Oct 6, 2024

user1823 commented Oct 8, 2024

Gilfaro commented Oct 8, 2024

Expertium commented Oct 8, 2024 •

edited

Loading

brishtibheja commented Oct 8, 2024

Gilfaro commented Oct 8, 2024

Expertium commented Oct 8, 2024

L-M-Sherlock commented Oct 18, 2024

user1823 commented Oct 18, 2024

L-M-Sherlock commented Oct 21, 2024

Gilfaro commented Oct 21, 2024 •

edited

Loading

L-M-Sherlock commented Oct 21, 2024 •

edited

Loading

Gilfaro commented Oct 21, 2024

user1823 commented Oct 21, 2024

L-M-Sherlock commented Oct 22, 2024

FSRS like to optimize parameter 7 to 0.0 #695

FSRS like to optimize parameter 7 to 0.0 #695

Comments

Gilfaro commented Oct 6, 2024

L-M-Sherlock commented Oct 6, 2024

Expertium commented Oct 6, 2024

Gilfaro commented Oct 6, 2024 • edited Loading

Expertium commented Oct 6, 2024

user1823 commented Oct 8, 2024

Gilfaro commented Oct 8, 2024

Expertium commented Oct 8, 2024 • edited Loading

brishtibheja commented Oct 8, 2024

Gilfaro commented Oct 8, 2024

Expertium commented Oct 8, 2024

L-M-Sherlock commented Oct 18, 2024

user1823 commented Oct 18, 2024

L-M-Sherlock commented Oct 21, 2024

Gilfaro commented Oct 21, 2024 • edited Loading

L-M-Sherlock commented Oct 21, 2024 • edited Loading

Gilfaro commented Oct 21, 2024

user1823 commented Oct 21, 2024

L-M-Sherlock commented Oct 22, 2024

Gilfaro commented Oct 6, 2024 •

edited

Loading

Expertium commented Oct 8, 2024 •

edited

Loading

Gilfaro commented Oct 21, 2024 •

edited

Loading

L-M-Sherlock commented Oct 21, 2024 •

edited

Loading