Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSRS like to optimize parameter 7 to 0.0 #695

Closed
Gilfaro opened this issue Oct 6, 2024 · 18 comments · Fixed by open-spaced-repetition/fsrs-optimizer#143
Closed

FSRS like to optimize parameter 7 to 0.0 #695

Gilfaro opened this issue Oct 6, 2024 · 18 comments · Fixed by open-spaced-repetition/fsrs-optimizer#143
Labels
bug Something isn't working

Comments

@Gilfaro
Copy link

Gilfaro commented Oct 6, 2024

FSRS 5 even more than 4.5 likes to optimize parameter 7 which also disables difficulty decay to 0.
Even manually changing the parameter after optimization results in better log loss and rmse(bins).
Probably needs more testing in benchmark, but setting minimum at about 0.0100 would fix this issue and improve the fit to data.

@Gilfaro Gilfaro added the bug Something isn't working label Oct 6, 2024
@L-M-Sherlock
Copy link
Member

Even manually changing the parameter after optimization results in better log loss and rmse(bins).

Could you reproduce it in 50%+ cases?

@Expertium
Copy link
Collaborator

I think it's better to benchmark it on Anki 20k.

@Gilfaro
Copy link
Author

Gilfaro commented Oct 6, 2024

Currently tested on 5 decks in Anki beta 2 and it improved it for all my decks.
Here is sample run from small benchmark where numbers are nearly the same, but from end user perspective it is much better if difficulty can be modified by using good/again.
Sometimes values much higher than clipper are more optimal, but once the optimizer decides on 0 it seems to be stuck in that local minimum.

Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387

parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]

Model: FSRS-5-clamp7
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3738±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0499±0.0185
FSRS-5 AUC (mean±std): 0.7062±0.0898

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3658±0.1511
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0243
FSRS-5 AUC (mean±std): 0.6742±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1545
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0245
FSRS-5 AUC (mean±std): 0.6669±0.1387

parameters: [0.3195, 1.04455, 2.5806, 15.4029, 7.208, 0.47475, 1.03605, 0.0252, 1.62875, 0.22925, 1.02125, 1.9363, 0.07145, 0.2694, 2.2139, 0.3364, 3.18255, 0.61735, 0.73345]

@Expertium
Copy link
Collaborator

@L-M-Sherlock we've talked about this before, and ran these kinds of benchmarks before. The conclusion here is the same - clamping barely affects the metrics. So it just comes down to preferences - do we want D to always change if the user pressed Good, even if it's a small change? I'd say yes.

@user1823
Copy link
Collaborator

user1823 commented Oct 8, 2024

In principle, I like the suggestion, but experience suggests otherwise.

In 20d2dae, L-M-Sherlock used 0.05 as the minimum value for this parameter. (It was called w[5] at that time.)

But, using 0.05 as the minimum limit with my collection, not only made RMSE worse, but also increased the workload (parameters calculated with the new limit on exactly the same collection gave me a backlog of 900 extra cards). First reported in #342 (comment)

L-M-Sherlock gave this explanation for the issue:

One rational explanation for your case is, you don't have ease hell, but w[5] assume you have. So w[5] will decrease the difficulty in the long-term. Then w[4] would increase to counteract or even override it, which induces the workload.

To fix the issue, he decreased the lower limit back to 0.

I advised using a small but non-zero lower limit (such as 0.0003)

His response was that such a low value won't result in any appreciable mean reversion and, thus, is no better than using 0.

The difference between 0.0003 and 0 is pretty small. If the initial difficulty is 5 and the current difficulty is 10. If you always press Good, here is the subsequent difficulty:

5 * 0.0003 + 10 * (1-0.0003) = 9.9985
5 * 0.0003 + 9.9985 * (1-0.0003) = 9.997
5 * 0.0003 + 9.997 * (1-0.0003) = 9.9955
...

@Gilfaro
Copy link
Author

Gilfaro commented Oct 8, 2024

Value 0.05 is way too high as in my N=1 case and small benchmark value of about 0.01 or close to it makes the difficulty scale and either improves RMSE or barely changes it.
Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%

@Expertium
Copy link
Collaborator

Expertium commented Oct 8, 2024

it improves all metrics by a lot in best case 20%

That's extremely weird, considering that the benchmarks show that the difference in RMSE between clamped w7 and unclamped is <1%

@brishtibheja
Copy link

My noob idea is do it both clamped and unclamped when that parameter is 0.0 and keep the params that are better but of course, you can deliberate on a better solution.

@Gilfaro
Copy link
Author

Gilfaro commented Oct 8, 2024

@Expertium
It is different as in benchmark case the clamping occurs on batch basis while in my manual case the clamping is done only at the end.

@Expertium
Copy link
Collaborator

I don't know what you mean

@L-M-Sherlock
Copy link
Member

Within the new 24.10 beta with new FSRS5 this is even worse, as soon as optimizer ends with 0.0 or close on w7 then if I change it to 0.01 or much higher, it improves all metrics by a lot in best case 20%

It's likely an extreme case. According to the benchmark, there are 50% users whose w[7] is less than 0.02. And 0 is the mode value of w[7]:

w 7

@user1823
Copy link
Collaborator

And 0 is the mode value of w[7]

That's what the OP is also saying.

We want to find out the value that is the most optimal, not the value that the optimizer is most likely to produce.

So, if using 0.01 (or maybe 0.005) as the lower limit of w[7] improves the RMSE for most users, we can consider that.

@L-M-Sherlock
Copy link
Member

I have tried w[7] >= 0.01 and w[7] >= 0.005, they both increased RMSE. So I decide to use 0.001.

@Gilfaro
Copy link
Author

Gilfaro commented Oct 21, 2024

if you mostly want lowest RMSE and trust the gradient descent you can remove the clipper completely and you will improve all metrics by a bit, then w[7] most optimal value is actually negative
analyzing this, means that the algo finds that all cards should approach 100% difficulty for most of the users and either the parameter for difficulty is useless or there is some core problem in the modeling of data

Model: FSRS-5
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3737±0.1183
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0188
FSRS-5 AUC (mean±std): 0.7064±0.0897

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3660±0.1512
FSRS-5 RMSE(bins) (mean±std): 0.0647±0.0245
FSRS-5 AUC (mean±std): 0.6741±0.1350

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3641±0.1547
FSRS-5 RMSE(bins) (mean±std): 0.0671±0.0247
FSRS-5 AUC (mean±std): 0.6668±0.1387

parameters: [0.32095, 1.0439, 2.5808, 15.4029, 7.20385, 0.5284, 1.02275, 0.02555, 1.63765, 0.225, 1.04685, 1.9363, 0.0724, 0.27135, 2.214, 0.34025, 3.1823, 0.6103, 0.73345]

Model: FSRS-5-noclipper
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3728±0.1170
FSRS-5 RMSE(bins) (mean±std): 0.0496±0.0173
FSRS-5 AUC (mean±std): 0.7109±0.0889

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3657±0.1504
FSRS-5 RMSE(bins) (mean±std): 0.0649±0.0241
FSRS-5 AUC (mean±std): 0.6757±0.1360

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3639±0.1539
FSRS-5 RMSE(bins) (mean±std): 0.0673±0.0244
FSRS-5 AUC (mean±std): 0.6680±0.1398

parameters: [0.33585, 1.0276, 2.8278, 15.4026, 7.1837, 0.482, 0.92045, -0.0653, 1.64665, 0.1899, 1.11895, 1.9508, 0.11625, 0.2786, 2.11275, 0.33215, 3.1705, 0.6295, 0.7321]

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Oct 21, 2024

the parameter for difficulty is useless

It's useful because the RMSE will increase significantly if I remove difficulty from the formula.

@Gilfaro
Copy link
Author

Gilfaro commented Oct 21, 2024

I tried the newest optimizer with linear damping and it shifts the most optimal value of w7 even more into the negatives

Model: FSRS-5-noclipper-lineardamping
Total number of users: 10
Total number of reviews: 255400
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3724±0.1170
FSRS-5 RMSE(bins) (mean±std): 0.0485±0.0173
FSRS-5 AUC (mean±std): 0.7106±0.0888

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3650±0.1503
FSRS-5 RMSE(bins) (mean±std): 0.0642±0.0246
FSRS-5 AUC (mean±std): 0.6768±0.1340

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3632±0.1538
FSRS-5 RMSE(bins) (mean±std): 0.0668±0.0249
FSRS-5 AUC (mean±std): 0.6692±0.1375

parameters: [0.31775, 1.0013, 2.8053, 15.4017, 7.2231, 0.46875, 1.2435, -0.11325, 1.59345, 0.1909, 1.063, 1.94885, 0.11515, 0.27055, 2.20005, 0.3062, 3.1973, 0.6309, 0.7335]

@user1823
Copy link
Collaborator

you can remove the clipper completely and you will improve all metrics by a bit

But, your results show that the RMSE becomes worse when the clipper is removed. This brings me to the second point.

trust the gradient descent

For such a complex model, we can't trust gradient descent fully. Finding the global minimum is probably not possible. So, we need to implement some rules to ensure that the results are as expected. The parameter clipper is one of those.

@L-M-Sherlock
Copy link
Member

For discussion about the difficulty variable and formula, please continue in #352 or open a new issue to propose your idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants