From 40fc84deb28b2ce50ead17f19953dc5e8f0712db Mon Sep 17 00:00:00 2001 From: Rain Date: Mon, 16 Oct 2023 22:17:15 +0100 Subject: [PATCH] retcon --- .vscode/ltex.dictionary.en-GB.txt | 1 + report.tex | 143 ++++++++++-------------------- 2 files changed, 48 insertions(+), 96 deletions(-) diff --git a/.vscode/ltex.dictionary.en-GB.txt b/.vscode/ltex.dictionary.en-GB.txt index 476893c..c873f54 100644 --- a/.vscode/ltex.dictionary.en-GB.txt +++ b/.vscode/ltex.dictionary.en-GB.txt @@ -10,3 +10,4 @@ tailwindcss-rtl AccountTray localStorage WCAG +Leitner diff --git a/report.tex b/report.tex index 2fa3c55..775391c 100644 --- a/report.tex +++ b/report.tex @@ -12,6 +12,13 @@ \usepackage{array,ragged2e,pst-node,pst-dbicons} \usepackage{mdframed} \usepackage{minted} +\usepackage{tikz} +\usetikzlibrary{shapes, arrows} +\tikzstyle{terminator} = [rectangle, draw, text centered, rounded corners, minimum height=2em] +\tikzstyle{process} = [rectangle, draw, text centered, minimum height=2em] +\tikzstyle{decision} = [diamond, draw, text centered, minimum height=2em] +\tikzstyle{data}=[trapezium, draw, text centered, trapezium left angle=60, trapezium right angle=120, minimum height=2em] +\tikzstyle{connector} = [draw, -latex'] \usemintedstyle{monokai} \BeforeBeginEnvironment{minted}{\begin{mdframed}[backgroundcolor=black!90]} @@ -302,124 +309,68 @@ \subsection{Data flow} A user signs up through the frontend, which saves their username and a salted hash of their password into the Users table. A user can then create a deck, which creates a deck in the Decks table, with a reference to their deck in UserDecks. They can then create cards, which creates cards in the Cards table, with a reference to their deck in UserDecks. Cards can be created, with a reference to their deck in DeckCards. Once a user begins using their decks, the software can then calculate the ability of the user to remember each card and store it in the database based on a "confidence" score and a date. Storing this information allows for an algorithm similar to Leitner boxes to be used to serve cards at optimal intervals. This cycle of reviewing cards repeatedly makes up the core functionality of the software. \paragraph{} -If a user wants to share their deck with other users, they can mark it as public. This allows it to be returned in search results, and for other users to adopt a copy to their account for them to then use in a similar fashion. +If a user wants to share their deck with other users, they can mark it as public. This allows it to be returned in search results, and for other users to adopt a copy to their account for them to then use similarly. \section{Algorithms} \subsection{Calculating pull rate} \paragraph{} -Cards that have high fluctuations of confidence are more difficult to remember, and cards that have low fluctuations are easier to remember. -\paragraph{} -We can represent this mathematically. $0 \leq c < 1$ represents the confidence ($c$) of the card, where $c = 0$ means the card is completely forgotten, and $c = 1$ means the card is completely remembered. - -\paragraph{} -Humans do not remember things linearly, however\footcite{ForgettingCurve}. There is a sharp decline in retention just one day after learning it. This is because it has not been shifted to long term memory. Because of this, a linear equation can't be used to judge the ability of a user to remember a card. - -\paragraph{} -Suppose we have a user that practices a card once. We can model how the card might decay in their memory against time in days passed since review ($t$) with this equation: -\[ - y = \frac{1}{(t + 1) ^ {2}} -\] +We can represent this mathematically. A card that keeps being forgotten will have a lower 'difficulty' compared to a card that is consistently remembered. +\[d = \frac{f}{1 + n + f} \] -\paragraph{} -We need to account for this decay in confidence by pulling poorly remembered cards more often than well remembered cards, while also considering time. If we can delay reviews such that the review happens right before forgetting a card, we can maximise the boost in confidence. - -\paragraph{} -We can define forgetting a card as a sudden decrease in confidence at a review ($\Delta c < -0.4$), or number of times confidence has gone below a certain threshold ($c < 0.15$). This value can be $f$, for the number of times the card has been forgotten. - -\paragraph{} -Bringing this all together, we end up with a way to rank the order the cards should be pulled, with $P$ being the "Pull" value. - -\[ - P = \frac{1}{(t + 1) ^ 2} \cdot \frac{n}{c ^ {-1}} \cdot \frac{1}{f + 1} -\] - -$n$ represents the number of days on which the card has been reviewed. - -$t$ represents an amount of time since the card was last reviewed. - -$f$ represents the number of times the card has been forgotten. - -Cards with a lower value of $P$ should be pulled first. - -\paragraph{} -The $\frac{1}{(t + 1) ^{2}}$ term is used to model memory decay over time, with higher values of $t$ reducing the value of $P$. - -\paragraph{} -The $\frac{n}{c ^{-1}}$ term is used to model the ability of the user to remember the card. A higher value of $n$ (more reviews) against a lower value of $c$ (low confidence) significantly decreases the value of $P$, as the card is likely very difficult. +$d$ is difficulty, $f$ is times forgotten, and $n$ is the number of times the card has been reviewed. +This can be used to select the kind of challenge the user is presented with. \paragraph{} -The $\frac{1}{f + 1}$ term is used to model the number of times the card has been forgotten. A lower value of $f$ increases the value of $P$. +We have a model for difficulty, but humans do not remember things linearly, however\footcite{ForgettingCurve}. There is a sharp decline in retention just one day after learning it. This is because it has not been shifted to long term memory. Because of this, a linear equation alone can't be used to judge the ability of a user to remember a card -- a separate system is needed to determine how often to pull a card. \paragraph{} -More concretely, suppose a user has a card, $A$, that they are very familiar with ($c = 0.85$). They have never forgotten it ($f = 0$), and have been doing reviews on it on ten different days ($n = 10$), with the last review being 30 hours ago ($t = 30$) We can calculate the following: - -\[ - P _ A = \frac{1}{(30 + 1) ^ 2} \cdot \frac{10}{0.85 ^ {-1}} \cdot \frac{1}{0 + 1} -\] - -\[ - P _ A = \frac{1}{(31) ^ 2} \cdot \frac{10}{0.85 ^ {-1}} -\] - -\[ - P _ A = \frac{1}{961} \cdot \frac{10}{0.85 ^ {-1}} -\] - -\[ - P _ A = \frac{17}{1922} = 0.008844953174 -\] - -\paragraph{} -Now suppose a user has a card, $B$, that is very difficult to remember ($c = 0.04$). They keep forgetting it ($f = 8$), and have been doing reviews on it on ten different days ($n = 10$), with the last review being 60 hours ago ($t = 60$). We can calculate the following: - -\[ - P _ B = \frac{1}{(60 + 1) ^ 2} \cdot \frac{10}{0.04 ^ {-1}} \cdot \frac{1}{8 + 1} -\] - -\[ - P _ B = \frac{1}{(61) ^ 2} \cdot \frac{10}{0.04 ^ {-1}} \cdot \frac{1}{9} -\] - -\[ - P _ B = \frac{1}{3721} \cdot \frac{10}{0.04 ^ {-1}} \cdot \frac{1}{9} -\] - -\[ - P _ B = \frac{2}{167445} = 0.00001194422049 -\] - -\[ - P _ B < P _ A -\] - -As expected, the harder card $B$ results in a lower Pull value, so should be selected first. - -\paragraph{} -If we create a new card, $C$, that has never been reviewed ($t = 0, f = 0, n = 0, c = 0$), we end up with an equation that reduces to this: - -\[ -P _ C = \frac{0}{0 ^ {-1}} -\] - -$0 ^ {-1}$ is undefined, so the result of this is undefined. However, we can treat this as a Pull value of $0$ for our purposes, mathematicians be damned. In other words, this card has the highest priority for our algorithm. In the event of a tie because of this rule, we simply choose a random card from the tied cards. +We can use tagging similar to Leitner boxes to model this. A higher box has a longer interval between reviews. +After each challenge, we ask the user how confident they are with the card (i.e. when they next want to see it -- mixed into the pool immediately, in a few days, or in a week.). This is then stored, and when the date passes the card can be randomly pulled along with other cards that need to be reviewed. \subsection{Challenge types} \paragraph{} -Now that we have a way to determine the Pull value of a card, we can start to decide what kind of challenges to serve the user once the card has been pulled. These challenges consider the confidence level of the card. +Now that we have a way to estimate the difficulty of a card, we can start to decide what kind of challenges to serve the user once the card has been pulled. These challenges consider the current difficulty of the card. -\subsubsection{Confidence thresholds} +\subsubsection{Difficulty thresholds} \paragraph{} -$0 \leq c < \frac{1}{3}$ is a low confidence card, so we serve a "challenge" that simply asks the user to review the card and rate it. We update the confidence accordingly based on if the user finds the card easy, medium, or hard. +$0 \leq d < \frac{1}{3}$ is a low difficulty card, so we serve a challenge that asks them to type out the other side of the card from memory. \paragraph{} -$\frac{1}{3} \leq c < \frac{2}{3}$ is a medium confidence card, so we serve a challenge that asks the user to pick out the other side of the card from a selection of multiple possibilities. Based on how long it takes the user to select the correct card, we update the confidence value, with faster response times meaning higher confidence. +$\frac{1}{3} \leq d < \frac{2}{3}$ is a medium difficulty card, so we serve a challenge that asks the user to pick out the other side of the card from a selection of multiple possibilities. \paragraph{} -$\frac{2}{3} \leq c < 1$ is a high confidence card, so we serve a challenge that asks them to type out the other side of the card from memory. If correct, confidence increases and if incorrect, confidence decreases. +$\frac{2}{3} \leq d < 1$ is a high difficulty card, so we serve a "challenge" that simply asks the user to review the card and rate it. We update the confidence accordingly based on if the user finds the card easy, medium, or hard. \paragraph{} Using these thresholds, we can have a variety of challenge types that scale to the user's ability. +\begin{tikzpicture}[node distance = 3cm] + \node [terminator] (start) {\textbf{Start}}; + \node [decision, below of=start, xshift=5cm] (any) {Any overdue?}; + \node [process, below of=any, xshift=-6cm] (pull) {Randomly pull overdue card}; + \node [decision, below of=pull] (difficulty) {Card difficulty}; + \node [process, right of=difficulty, xshift=1.5cm] (show) {Show card}; + \node [process, below of=difficulty] (pick) {Multiple choice}; + \node [process, left of=difficulty, xshift=-1.5cm] (type) {Type card}; + \node [data, below of=pick] (rate) {Rate confidence}; + \node [process, below of=rate] (update) {Update next due date}; + \node [terminator, below of=any, xshift=2cm] (end) {\textbf{End}}; + + \path [connector] (start) -| (any); + \path [connector] (any) -| node [anchor=south] {yes} (pull); + \path [connector] (pull) -- (difficulty); + \path [connector] (difficulty) -- node [anchor=south] {low} (type); + \path [connector] (difficulty) -- node [anchor=east] {medium} (pick); + \path [connector] (difficulty) -- node [anchor=south] {high} (show); + \path [connector] (type) |- (rate); + \path [connector] (pick) -- (rate); + \path [connector] (show) |- (rate); + \path [connector] (rate) -- (update); + \path [connector] (update) -| (any); + \path [connector] (any) -| node [anchor=south] {no} (end); + \end{tikzpicture} + + \subsubsection{Sign up} \paragraph{} On sign up, users are asked to send a username they want alongside their password to the server. First, it must check that the username is valid (alphanumeric and within a certain length constraint). If it is valid, the server then checks the database to see if the username is already taken. If it isn't taken, we hash their password with a salt, create a new entry in the Users table with their username, hashed password, salt, the current time. We create a session token and then send the token to the client. If the username is invalid or has already been taken, we send an error back to the client.