You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️ Please check that this feature request hasn't been suggested before.
I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.
🔖 Feature description
Hi thanks for the library! A common practice of deep learning seems to be that, we log the exact inputs and labels of (a portion of) a single batch. For example, I personally log the input_ids (and convert it back to text), attention_masks, model outputs, labels, etc.
This can help debug a lot of problems. For example, if someone has a wrong BOS/EOS token, then it can be spotted immediately. As another example, if we want to train on completions only but forgets to do so, the logged labels can hint us on that.
✔️ Solution
(see above)
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.
The text was updated successfully, but these errors were encountered:
However, since it passes the train_dataloader to say the on_log or on_step_end callback, you could in theory get data from it, but I'm not exactly sure how to get the current row at the current step without moving the iterator on the dataloader and affecting the actual training.
If you're up for tackling this and submitting a PR, we would be happy to help and get it merged in.
I am currently writing a subclass and hack the compute_loss for it for my internal code, at the same time I want to use axolotl as a comparison test to reveal potential bugs my internal code (i.e. axolotl and my code should get same accuracy). Thus I may not have enough time to PR to axolotl. But if you like to have a look at how I hacked it as a rough draft, feel free to ping me.
🔖 Feature description
Hi thanks for the library! A common practice of deep learning seems to be that, we log the exact inputs and labels of (a portion of) a single batch. For example, I personally log the input_ids (and convert it back to text), attention_masks, model outputs, labels, etc.
This can help debug a lot of problems. For example, if someone has a wrong BOS/EOS token, then it can be spotted immediately. As another example, if we want to train on completions only but forgets to do so, the logged labels can hint us on that.
✔️ Solution
(see above)
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements
The text was updated successfully, but these errors were encountered: