Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO list #137

Open
Expertium opened this issue Dec 10, 2024 · 17 comments
Open

TODO list #137

Expertium opened this issue Dec 10, 2024 · 17 comments

Comments

@Expertium
Copy link
Contributor

Expertium commented Dec 10, 2024

This is just to keep track of stuff

  1. Add sibling information in a way that FSRS can work with. I need 3 modifications of Anki 10k #136 (comment), https://discord.com/channels/368267295601983490/1282005522513530952/1320698604771348570 Done ✅
  2. During testing, filter out same-day reviews using delta_t in days, but for calculations keep delta_t in seconds. https://forums.ankiweb.net/t/due-column-changing-days-from-whole-numbers-to-decimals-in-scheduling/52213/53?u=expertium
    Done ✅
  3. Benchmark obezag's idea. https://discord.com/channels/368267295601983490/1282005522513530952/1315873110171451422
  4. Fine-tune the formula for interpolating missing S0: https://discord.com/channels/368267295601983490/1282005522513530952/1319714323680989256
  5. Benchmark updating D before S Done ✅
  6. Benchmark setting weights of outliers to 0: https://discord.com/channels/368267295601983490/1282005522513530952/1320670294188363778
  7. Benchmark FSRS v1 and FSRS v2 Done ✅
@L-M-Sherlock
Copy link
Member

What's "Fine-tune the formula for interpolating missing S0"?

@Expertium
Copy link
Contributor Author

Check the Discord message, I attached a link

  1. In the 10k dataset, find users who haven't used one or two buttons during the first review, but started doing so in the second half of their review history
  2. Estimate the missing S0 https://github.com/open-spaced-repetition/fsrs-optimizer/blob/9b9b700ea463a2505f28d8c04717d9bd34787d5e/src/fsrs_optimizer/fsrs_optimizer.py#L1107 based on the first halves of their review histories
  3. Run the optimizer on the second halves, don't calculate S0 normally, fill in S0 using the previous estimates. Example: a user never used Good. You use your wacky formula (that I don't understand) to estimate S0. Then, for the second half of the review history, one where the user does use Good, use your estimate from the previous half
  4. Get logloss/RMSE values, tweak w1 and w2 in your wacky formula
  5. Repeat steps 2-4 many times until you find the best combination of w1 and w2

@Expertium
Copy link
Contributor Author

Also, don't forget to benchmark FSRS v1 and FSRS v2. I'll add that to the list

@L-M-Sherlock
Copy link
Member

Check the Discord message, I attached a link

I checked, but I don't get the details. Could you elaborate the idea here?

@Expertium
Copy link
Contributor Author

Expertium commented Dec 26, 2024

You have w1 and w2, which are currently set to 3/5. The idea is to find users who have not used one or two grades during their first reviews during the first half of their review history, but started using those grades later, calculate missing S0 values based on that, and check how well they fit.
For example, say a user has never pressed Good during the first half of his review history. You use that first half to calculate the missing S0(Good). Then you use that S0(Good) during optimization on the second half of his review history, and check RMSE.
Then you do that for all users who meet these criteria (not used one or two grades during their first reviews during the first half of their review history, but used them in the second half) and for different values of w1 and w2.
If you have a better idea how to fine-tune w1 and w2, feel free to do that.

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Dec 30, 2024

Only these collections match your conditions:

image

@Expertium
Copy link
Contributor Author

Oh well. Forget about it then

@Expertium
Copy link
Contributor Author

Expertium commented Dec 30, 2024

Oh, wait, @L-M-Sherlock
image
The first condition should be df_first_half["rating"].nunique()==2 or df_first_half["rating"].nunique()==3, not just df_first_half["rating"].nunique()==2, since w1 and w2 are used for interpolating S0 values both when one value is missing and when two values are missing.

Also, are you sure you are checking the right thing? We're not looking for users who never used a certain answer button, we're looking for users who didn't use a certain answer button for their first reviews. For example, if someone never used "Good" during their first review, but used it during their second/third/nth review, then he should count, and you should record his ID.
So you need something like this: df = df.loc[(df['elapsed_seconds'] == -1) & (df['elapsed_days'] == -1)], so that you only look at first reviews.
(I have "undone" item 4 on the todo list)

@L-M-Sherlock
Copy link
Member

def process(user_id):
    df = pd.read_parquet(DATA_PATH, filters=[("user_id", "==", user_id)])
    df["review_th"] = range(1, df.shape[0] + 1)
    df.sort_values(by=["card_id", "review_th"], inplace=True)
    df.drop(df[df["elapsed_days"] == 0].index, inplace=True)
    df["i"] = df.groupby("card_id").cumcount() + 1
    df["y"] = df["rating"].map(lambda x: {1: 0, 2: 1, 3: 1, 4: 1}[x])
    df = df[(df["elapsed_days"] > 0) & (df["i"] == 2)].sort_values(by=["review_th"])
    length = len(df)
    df_first_half = df.iloc[:length // 2]
    df_second_half = df.iloc[length // 2:]
    if df_first_half["rating"].nunique() == 2 and df_second_half["rating"].nunique() == 4:
        print(user_id)
    return user_id

So you need something like this: df = df.loc[(df['elapsed_seconds'] == -1) & (df['elapsed_days'] == -1)], so that you only look at first reviews.

The df["i"] == 2 plays the same role here.

@Expertium
Copy link
Contributor Author

Ok, but you haven't done this

The first condition should be df_first_half["rating"].nunique()==2 or df_first_half["rating"].nunique()==3, not just df_first_half["rating"].nunique()==2, since w1 and w2 are used for interpolating S0 values both when one value is missing and when two values are missing.

@L-M-Sherlock
Copy link
Member

image

@Expertium
Copy link
Contributor Author

That's not what I said, though
I meant like this

if (df_first_half["rating"].nunique()==2 or df_first_half["rating"].nunique()==3) and df_secod_half["rating"].nunique()==4:
    print(user_id)

@L-M-Sherlock
Copy link
Member

image

Now we have ~300 collections.

@Expertium
Copy link
Contributor Author

Nice. Now I want you to do what I described above:

  1. Using df_first_half estimate S0. Use your formula with w1 and w2 to fill in missing values.
  2. Run the optimizer on df_second_half, and use S0 values from the previous step.
  3. Do steps 1-2 for each user, record average RMSE.
  4. Change w1 and w2 and repeat steps 1-3 until you find good w1 and w2.

The key idea is that by using S0 from the first half we can check how well it fits the second half.

@L-M-Sherlock
Copy link
Member

I think it's really hard to evaluate the S0 with such less data:

image image image

@Expertium
Copy link
Contributor Author

Man...
Alright, forget about it then

@Expertium
Copy link
Contributor Author

Expertium commented Jan 2, 2025

@L-M-Sherlock I have a better idea

  1. Find all users who use all 4 buttons during the first review AND each button is used at least 200 times
    So if you display button counts (for the first review only) like this:
    Again: x1, Hard: x2, Good: x3, Easy: x4
    Then each x must be >=200
  2. Send me a .jsonl or .csv file with S0 values of each such user. If there are 5000 users like this, then the .jsonl file should have 5000 lines

I'll fine-tune w1 and w2 by removing 1-2 S0 values and filling them back in using the formula with w1 and w2, and then minimizing MAPE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants