Show sample batch content #2145

fzyzcjy · 2024-12-07T05:30:00Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Hi thanks for the library! A common practice of deep learning seems to be that, we log the exact inputs and labels of (a portion of) a single batch. For example, I personally log the input_ids (and convert it back to text), attention_masks, model outputs, labels, etc.

This can help debug a lot of problems. For example, if someone has a wrong BOS/EOS token, then it can be spotted immediately. As another example, if we want to train on completions only but forgets to do so, the logged labels can hint us on that.

✔️ Solution

(see above)

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

winglian · 2024-12-08T02:53:50Z

This might be feasible using the trainer_callback (see https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_callback.py#L283-L284)

However, since it passes the train_dataloader to say the on_log or on_step_end callback, you could in theory get data from it, but I'm not exactly sure how to get the current row at the current step without moving the iterator on the dataloader and affecting the actual training.

If you're up for tackling this and submitting a PR, we would be happy to help and get it merged in.

fzyzcjy · 2024-12-08T03:03:33Z

I am currently writing a subclass and hack the compute_loss for it for my internal code, at the same time I want to use axolotl as a comparison test to reveal potential bugs my internal code (i.e. axolotl and my code should get same accuracy). Thus I may not have enough time to PR to axolotl. But if you like to have a look at how I hacked it as a rough draft, feel free to ping me.

fzyzcjy added the enhancement New feature or request label Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show sample batch content #2145

Show sample batch content #2145

fzyzcjy commented Dec 7, 2024 •

edited

Loading

winglian commented Dec 8, 2024

fzyzcjy commented Dec 8, 2024

Show sample batch content #2145

Show sample batch content #2145

Comments

fzyzcjy commented Dec 7, 2024 • edited Loading

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

winglian commented Dec 8, 2024

fzyzcjy commented Dec 8, 2024

fzyzcjy commented Dec 7, 2024 •

edited

Loading