Remove unnecessary array allocations in generation process and enable caching #308

brandonwillard · 2023-10-02T06:08:50Z

This PR removes some unnecessary array allocations during the generation process that affect scaling in max tokens and adds KV caching.

Perhaps the biggest non-cache-based change is that the method Sequence.update_token_ids has been removed; otherwise, the dimensions for arrays returned by Sequence.step are fixed (i.e. no squeezing). This makes the dimensions in Sequence.__call__ clearer and allows us to simplify the loop (e.g. no need to duplicate the steps in the auto-regression loop before starting the loop).

outlines/text/generate/sequence.py

Closes dottxt-ai#186

brandonwillard force-pushed the fix-sequence-scaling branch from 8f7836d to 1a5082f Compare October 2, 2023 06:09

brandonwillard requested a review from rlouf October 2, 2023 06:09

brandonwillard commented Oct 2, 2023

View reviewed changes

outlines/text/generate/sequence.py Outdated Show resolved Hide resolved

brandonwillard added text Linked to text generation enhancement optimization Related to performance optimizations labels Oct 2, 2023

brandonwillard self-assigned this Oct 2, 2023

brandonwillard force-pushed the fix-sequence-scaling branch 6 times, most recently from dc551ff to d55bbc1 Compare October 3, 2023 23:04

brandonwillard changed the title ~~Remove unnecessary array allocations in generation process~~ Remove unnecessary array allocations in generation process and enable caching Oct 3, 2023

brandonwillard added the transformers Linked to the `transformers` integration label Oct 3, 2023

brandonwillard added 2 commits October 3, 2023 18:55

Remove unnecessary array allocations in generation process

88d83d7

Enable KV caching

50352a7

Closes dottxt-ai#186

brandonwillard force-pushed the fix-sequence-scaling branch from d55bbc1 to 50352a7 Compare October 3, 2023 23:55

brandonwillard merged commit a8429a3 into dottxt-ai:main Oct 4, 2023
5 checks passed

brandonwillard deleted the fix-sequence-scaling branch October 4, 2023 03:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unnecessary array allocations in generation process and enable caching #308

Remove unnecessary array allocations in generation process and enable caching #308

brandonwillard commented Oct 2, 2023 •

edited

Loading

Remove unnecessary array allocations in generation process and enable caching #308

Remove unnecessary array allocations in generation process and enable caching #308

Conversation

brandonwillard commented Oct 2, 2023 • edited Loading

brandonwillard commented Oct 2, 2023 •

edited

Loading