Logits processors: Update inplace, with batching #92

lapp0 · 2024-10-05T20:38:11Z

Changes

For GuideLogitsProcessor,

Update logits inplace: 7514aff
- Fixes Update logits array in-place dottxt-ai/outlines#859
Batch update logits: 8aa0b0d
- (Because I noticed GPU profile was worse than CPU due to cuda synchronization)

CI doesn't benchmark torch_cuda, so I've included it here.

Before [`245c7fc`]	After [`7514aff`]	Ratio	Benchmark (Parameter)
159_0.4μs	181_1μs	1.14	`time_structured_generation('numpy', 'Z*')`
149_0.3μs	170_0.7μs	1.14	`time_structured_generation('torch', 'Z*')`
292_0.8μs	254_1μs	0.87	`time_structured_generation('torch_cuda', 'Z*')`
572_2μs	391_1μs	0.68	`time_structured_generation('torch_cuda', '[^Z]*')`

Before [`245c7fc`]	After [`8aa0b0d`]	Ratio	Benchmark (Parameter)
481_5μs	401_2μs	0.83	`time_structured_generation('numpy', '[^Z]*')`
466_3μs	386_1μs	0.83	`time_structured_generation('torch', '[^Z]*')`
159_0.8μs	106_0.5μs	0.67	`time_structured_generation('numpy', 'Z*')`
149_0.7μs	94.7_0.2μs	0.64	`time_structured_generation('torch', 'Z*')`
290_1μs	149_0.4μs	0.51	`time_structured_generation('torch_cuda', 'Z*')`
573_3μs	229_1μs	0.4	`time_structured_generation('torch_cuda', '[^Z]*')`

We can cache the RegexGuide legal token mask on GPU to improve time_structured_generation('torch', '[^Z]*'). In this benchmark, allowed_tokens is all tokens except Z, ZZ, and ZZZ, resulting in a large LongTensor being sent to GPU each step.

lapp0 added the run-benchmarks label Oct 5, 2024

lapp0 changed the title ~~Logits processors inplace change logits~~ Logits processors: Update inplace, with batching Oct 5, 2024

lapp0 added 2 commits October 7, 2024 10:07

update logits in place for GuideLogitsProcessor

36875a0

construct logits mask in batch operation

094af23

lapp0 force-pushed the logits-processors-inplace-change-logits branch from 8aa0b0d to 094af23 Compare October 7, 2024 14:07