Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting notation i.e.: h00 from attention topic. #56

Open
jdgh000 opened this issue Dec 2, 2024 · 1 comment
Open

Interpreting notation i.e.: h00 from attention topic. #56

jdgh000 opened this issue Dec 2, 2024 · 1 comment

Comments

@jdgh000
Copy link

jdgh000 commented Dec 2, 2024

I am trying to verbally explain to myself the meaning of notation introduced from Fig 9.10 vs. Fig 9.6, where 9.10 depiction is first illustration of attention network basics.
Fig 9.6 is pretty simple: h0, h1 - hidden state 0 (vector), hidden state 1, and x0, x1 is input0, input1.

Fig 9.10: h00, h01 is output s0 (source 0) and h10, h11 is output by s1. I can explain, just like Fig 9.9 h0 part of h00, h01 corresponds to hidden state 0 from source 0 and so on, however second digit, I am trying to wrangle my head over it.
Should I interpret the second digit as denoting of each vector element? That seems evident from Figure 9.12 where it explains how attn scores are calculated. Somehow we do some operations on [h00, h01] op [h20, h21] => we get 0.8 and [h10, h11] op [h20, h21] => we get another scalar value of 0.2. However scalar value is only possible if we do dot product: [h00, h01] dot [h10, h11] = h00 * h10 + h10 * h11 = 0.8

As a convenience, I attached the screenshots
Screenshot 2024-12-01 180352
Screenshot 2024-12-01 174954

@dvgodoy
Copy link
Owner

dvgodoy commented Dec 16, 2024

Hi @jdgh000 ,

I'm sorry for the delayed response.

You're right in your interpretation of the notation. The second digit in every group, both in Figure 9.6 and 9.10 represents the index of the i-th element in the vector. So, there are three vectors (h0, h1, and h2) which have two elements each.

You're also right about the dot product. In the attention formula (the QK^T part), we compute the dot product between the query (h2) and the keys (h0 and h1).
The first dot product, between h0 and h2, is: h00 * h20 + h01 * h21, resulting in the hypothetical attention score of 0.8 (Figure 9.12).
Similarly, we compute the dot product for the second key (h1): h10 * h20 + h11 * h21, with a resulting score of 0.2.

Best,
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants