You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to verbally explain to myself the meaning of notation introduced from Fig 9.10 vs. Fig 9.6, where 9.10 depiction is first illustration of attention network basics.
Fig 9.6 is pretty simple: h0, h1 - hidden state 0 (vector), hidden state 1, and x0, x1 is input0, input1.
Fig 9.10: h00, h01 is output s0 (source 0) and h10, h11 is output by s1. I can explain, just like Fig 9.9 h0 part of h00, h01 corresponds to hidden state 0 from source 0 and so on, however second digit, I am trying to wrangle my head over it.
Should I interpret the second digit as denoting of each vector element? That seems evident from Figure 9.12 where it explains how attn scores are calculated. Somehow we do some operations on [h00, h01] op [h20, h21] => we get 0.8 and [h10, h11] op [h20, h21] => we get another scalar value of 0.2. However scalar value is only possible if we do dot product: [h00, h01] dot [h10, h11] = h00 * h10 + h10 * h11 = 0.8
As a convenience, I attached the screenshots
The text was updated successfully, but these errors were encountered:
You're right in your interpretation of the notation. The second digit in every group, both in Figure 9.6 and 9.10 represents the index of the i-th element in the vector. So, there are three vectors (h0, h1, and h2) which have two elements each.
You're also right about the dot product. In the attention formula (the QK^T part), we compute the dot product between the query (h2) and the keys (h0 and h1).
The first dot product, between h0 and h2, is: h00 * h20 + h01 * h21, resulting in the hypothetical attention score of 0.8 (Figure 9.12).
Similarly, we compute the dot product for the second key (h1): h10 * h20 + h11 * h21, with a resulting score of 0.2.
I am trying to verbally explain to myself the meaning of notation introduced from Fig 9.10 vs. Fig 9.6, where 9.10 depiction is first illustration of attention network basics.
Fig 9.6 is pretty simple: h0, h1 - hidden state 0 (vector), hidden state 1, and x0, x1 is input0, input1.
Fig 9.10: h00, h01 is output s0 (source 0) and h10, h11 is output by s1. I can explain, just like Fig 9.9 h0 part of h00, h01 corresponds to hidden state 0 from source 0 and so on, however second digit, I am trying to wrangle my head over it.
Should I interpret the second digit as denoting of each vector element? That seems evident from Figure 9.12 where it explains how attn scores are calculated. Somehow we do some operations on [h00, h01] op [h20, h21] => we get 0.8 and [h10, h11] op [h20, h21] => we get another scalar value of 0.2. However scalar value is only possible if we do dot product: [h00, h01] dot [h10, h11] = h00 * h10 + h10 * h11 = 0.8
As a convenience, I attached the screenshots
The text was updated successfully, but these errors were encountered: