-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Ideas to further improve the accuracy of the algorithm #461
Comments
Also, regarding S and siblings. |
@L-M-Sherlock @user1823 thoughts? I would like to see Sherlock benchmark this. |
Currently, I have just one suggestion: Use the accurate value of the constant. def power_curve(delta_t, S):
return (1 + 19 * delta_t / (81 * S)) ** -0.5 If you are wondering how to calculate this, refer to the formula I shared here: #215 (comment) (obtained using Wolfram Alpha) |
Thank you, I was wondering how to derive the value of that constant. Also, this comment of mine was wrong, as the shape does change. Though we still need to figure out the inverse function (that gives t when R and S are known) in its general form. If you can do that, that would be great! |
Well, this can be obtained by using basic algebra on the formula I shared previously. But, I still calculated it for you: |
Thank you. @L-M-Sherlock what's your opinion on the following questions (user1823, you too)
EDIT: it has to be fixed, otherwise it's impossible to accurately estimate initial S before the optimization starts. |
In my opinion, the answers would depend upon the results of benchmark testing. Make
|
If FSRS wasn't being integrated into Anki, I would suggest implementing this later, because this would introduce more issue with compatibility between different versions and all of that. But since FSRS is being integrated into Anki, we might as well do some changes. It's kinda like using a new building material - you're not going to demolish an entire building just to use a new type of concrete to rebuild it, but if you have to build a new building anyway, you might as well use the new type of concrete.
About 20-25% for my decks (fixed
Good point. |
Actually, I just realized that making |
I will test it in the benchmark soon. |
Idea 3: don't remove same-day reviews. Don't change S, but only change D. |
p = 0.1 (No significant difference) |
That's very strange, are you sure you changed everything correctly?
|
There is only one function for calculating the retention in fsrs-optimizer: def power_forgetting_curve(t, s):
return (1 + t / (9 * s)) ** -1 So I only modified that function. |
Ah, ok. When I was testing it in an older version, I had to modify it in two places. |
I think the effect of idea 2 depends on the degree of heterogeneity of user's data. It really improved the performance in some users' collections, but it also made some users' performance worse. Besides, if you are learning language, the immersion study beyond Anki would also flatten the forgetting curve. |
If the power could become an optimizable parameter, it would be much better and would adapt to each user. The issue is that I don't know how to combine that with our method of estimating initial S. Actually, how about this: after the 4 initial values of S are estimated, we allow the optimizer to change them? Though choosing clampings is an issue. |
@L-M-Sherlock how about this: the initial S is estimated using f=0.5 (or 1), then 4 values are passed to the optimizer as parameters, with clampings (0.1, 365), and then the optimizer can change them, as well as f? This way the optimizer can use those estimates as initial guesses and update them as f changes. |
Related to dispersing siblings: https://github.com/thiswillbeyourgithub/AnnA_Anki_neuronal_Appendix |
@L-M-Sherlock, I have a question. I have 2 ideas with a lot of potential.
|
I plan to add fsrs-rs into fsrs-benchmark. It is in high priority, because I want to know the difference between these two implementation. Then I have another research work to develop, where I need to collaborate with a PhD from the University of Minnesota. It would also take me a lot of time. So FSRS v5 wouldn't start in weeks. |
@L-M-Sherlock do you think there is a need for V5. V4 seems to be more than good enough. |
I'm not Sherlock, obviously, but I believe that if there is a way to make the algorithm more accurate, then why not? |
he seems to be working really hard :( |
Well, it's not the end of the world if he starts working on what I suggested above a few weeks or a few months later, just as long as he doesn't completely abandon improving accuracy. |
Also, I've mentioned this before, but the correlation coefficient between log-loss and RMSE(bins) is only around 0.4-0.5 (according to my testing), and it's not uncommon to see a huge improvement in RMSE while log-loss barely changes. So ideally, we should find a way to optimize directly for RMSE rather than for log-loss. But I have no idea how to do that while still using the mini-batch approach. |
@L-M-Sherlock I apologize for the inconvenience, but can you help me?
And then replace EDIT: maybe it has something to do with the fact that this loss function is non-differentiable, but if that's the case, I would expect pytorch to raise an error and stop optimization.
And the sort() function is obviously not differentiable. So this doesn't work either. |
I mean the new pretraining method is radical. |
The new method gives larger stabilities, but it's more accurate that way, so we should use it. |
Maybe it's time to adopt the idea of flatter power forgetting curve. It doesn't introduce any extra trainable parameters. And it could cooperate with the new pretraining method. I want to delay the idea of considering short-term schedule because it requires a major changes of the scheduler in the Anki client. And the pattern of short-term memory is still unclear. |
Other finding: The benchmark doesn't include outlier filter, which also makes the initial stability too long. I will fix it soon. |
You have already filtered the outliers in the dataset. Right? If you add the filter to the benchmark too, you will be filtering them twice. |
Nope. I only filtered out reviews like manual reset and short-term reviews. |
I got back to my idea of correcting S based on the interval and grade. The idea is that if the delta_t/S ratio is high yet the grade is “Good” or “Easy”, S was underestimated. If the delta_t/S ratio is low yet the grade is “Hard” or “Again”, S was overestimated. Then we obtain a corrected estimate of S.
I tried different clampings, different initial parameters, etc. It didn't improve RMSE. |
Btw, Sherlock, is there a way to use |
Could you provide an case for that? |
|
It's better to assign the value via masking: r_g = 1
r_g[X[:, 1] == 1] = self.w[19]
r_g[X[:, 1] == 2] = self.w[20]
r_g[X[:, 1] == 3] = self.w[21] |
Btw @L-M-Sherlock, I believe Anki 23.12 beta uses the median parameters, right? But since we have the mode estimator now, let's use the mode parameters. I suggest giving the mode parameters to Dae, so that he can add them to the next beta. |
I have updated the parameters in open-spaced-repetition/fsrs-rs#135 |
Shouldn't the first line be |
I have an idea how to handle the situation with users using "Hard" instead of "Again": add a toggle "I use Hard as a failing grade" or something like that to Anki. Then, if the user enables that option, Hard will be assigned a value of 0 rather than 1 internally in FSRS, and it will affect the calculation of the loss, retention, RMSE, and all that. And there will be a new formula for new S, similar to the current formula for Again, but for Hard.
A simpler solution would be to just allow SInc for Hard to be <1, without changing anything else, but the issue is that Hard is assigned a value of 1 for the purpose of calculating the loss. We could come up with some method of determining whether Hard should be assigned a value of 0 or 1, but I don't know how. And even if I knew how, it would likely involve selecting some arbitrary threshold. |
This issue has included too many comments. Most comments have been auto-hidden by GitHub. It's inconvenient to search them. Maybe we need a discord channel to discuss about the development of FSRS. @Expertium, are you in the Anki's discord server? |
* Feat/flat power forgetting curve * sqrt(count) as weights for pretrain open-spaced-repetition/fsrs4anki#461 (comment) * float eq * use assert_approx_eq * channel up * make DECAY, FACTOR const and pub * clippy fix * pub(crate) * fix test --------- Co-authored-by: AsukaMinato <[email protected]>
No. Can you give me a link? |
Which channel should I use? Honestly, I think it's better to just make a new github issue. |
Also, if you plan to add the new forgetting curve and change how pretrain works, I suggest renaming the version to v4.5. Also, you'll need to run the benchmark again to calculate new mode parameters. |
I have added the new forgetting curve in FSRS-rs and fsrs-optimizer. I'm benchmarking them in this PR: open-spaced-repetition/srs-benchmark#22 |
I understand that FSRS v5 won't be released for a while, but I still decided to open this issue to collect good ideas.
There is already an issue for D, so I won't be sharing ideas related to D, granted, I don't have any anyway.
Idea 1: R-matrix.
I have already talked about it here, but I want to talk more about the details. Last time Sherlock tried it, there was an important flaw with grouping, and unfortunately Sherlock still hasn't tried it with improved grouping, so here's how I think it should be done:
For
t
, a formula similar to this should be used:math.pow(1.4, math.floor(math.log(x, 1.4))), 2))
. This formula is used in the B-W Heatmap, and it should work well here too. Although, I suggest a slight modification:1.2*math.pow(1.4, math.floor(math.log(x, 1.4))), 2))
. This makes the rounded values closer to the original values on average. The problem with the first formula is that rounded values are always less than or equal to original values, so on average they are smaller than original values.For
n
, I came up with this:math.ceil(math.floor((x+0.75)**0.75)**(1/0.75))
.Again, I ensured that on average, rounded values are close to the original one, so there is no over- or underestimation.
For
lapse
, just usemin(x, 8)
. This way all cards with >=8 lapses will be put into the same category, as they all are leeches.Pros: adding a self-correction mechanism to FSRS will allow it to correct its own bad predictions based on real data.
Cons: hard to implement (especiall on mobile), possible non-monotonicity of the final estimate of R, which relies both on theoretical value and the R-matrix value.
Idea 2: A new power function R=f(S, t).
Here 4.26316 is just a constant that ensures that R=90% when S=t.
I have already tested it in v4.0.0 Beta, it decreases RMSE by roughly 22% for my decks.
Pros: very easy to implement and benchmark.
Cons: this deviates from theory even further, as in theory forgetting should be exponential.
The text was updated successfully, but these errors were encountered: