How to Perform Inference with Batch Processing. #5

Chlience · 2024-10-10T04:45:05Z

I'm currently using this model for inference, and I would like to know how to generate inference results in batch mode. Specifically, I'm trying to avoid processing inputs one by one and instead process multiple inputs in a single forward pass for efficiency.

Could you please provide guidance or examples on how to:

Structure inputs for batch processing.
Modify the inference pipeline to handle batches.
Optimize batch size for performance without running into memory issues.

Any advice, sample code, or references to the documentation would be greatly appreciated.

Thanks for your help!

Achazwl · 2024-10-14T13:35:22Z

Different sentences within a batch may have various acceptance lengths in speculative sampling [1,2], thus requires careful padding, scheduling, or custom kernel implementation. We haven't support batch processing in our implementation now.

[1] Liu Xiaoxuan, et al. Optimizing Speculative Decoding for Serving Large Language Models Using Goodput. 2024.
[2] Qian Haifeng, et al. BASS: Batched Attention-optimized Speculative Sampling. 2024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Perform Inference with Batch Processing. #5

How to Perform Inference with Batch Processing. #5

Chlience commented Oct 10, 2024

Achazwl commented Oct 14, 2024 •

edited

Loading

How to Perform Inference with Batch Processing. #5

How to Perform Inference with Batch Processing. #5

Comments

Chlience commented Oct 10, 2024

Achazwl commented Oct 14, 2024 • edited Loading

Achazwl commented Oct 14, 2024 •

edited

Loading