updated train for computing per token loss #40

adamimos · 2024-03-14T19:33:03Z

had to explicitly calculate loss for this

melembroucarlitos · 2024-03-15T00:50:36Z

I'm confused. Why are we calculating loss per token if we're not logging the value?

adamimos · 2024-03-15T02:49:46Z

The main thing I was thinking is that this was the more general thing, that would work for cases where we did or did not want to log by token. I was going to implement the logging by token in a different pull request.

A secondary issue was that the way transformerlens deals with loss is to not compute it on the final token since there's no data for what that prediction should be. So I at least wanted to manually do cross-entropy loss for the logits in order to get that (but this is a seperate issue for the per token thing).

updated train for computing per token loss

9b009a0

had to explicitly calculate loss for this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updated train for computing per token loss #40

updated train for computing per token loss #40

adamimos commented Mar 14, 2024

melembroucarlitos commented Mar 15, 2024

adamimos commented Mar 15, 2024

updated train for computing per token loss #40

Are you sure you want to change the base?

updated train for computing per token loss #40

Conversation

adamimos commented Mar 14, 2024

melembroucarlitos commented Mar 15, 2024

adamimos commented Mar 15, 2024