Add prompt lookup decoding #379

as-suvorov · 2024-04-23T17:05:42Z

Ticket: 138549

…decoding

text_generation/causal_lm/cpp/CMakeLists.txt

text_generation/causal_lm/cpp/prompt_lookup_decoding.cpp

text_generation/causal_lm/cpp/CMakeLists.txt

text_generation/causal_lm/cpp/prompt_lookup_decoding.cpp

as-suvorov · 2024-04-25T10:26:37Z

@Wovchena , we have a proposal for optimizing kv cache trimm from @sammysun0711: sammysun0711@d7a24e5, based on parallel for.
It could give 3x speed up for cache update.
Should we apply it as well?

…as/prompt_lookup_decoding

text_generation/causal_lm/cpp/prompt_lookup_decoding.cpp

text_generation/causal_lm/cpp/prompt_lookup_decoding_lm.cpp

ilya-lavrenov · 2024-04-25T21:41:59Z

text_generation/causal_lm/cpp/prompt_lookup_decoding_lm.cpp

+
+        // cut redundant candidates on last iteration
+        size_t tokens_to_generate = max_sequence_length - seq_len;
+        if (candidates.size() > tokens_to_generate - 1) {


looks like we can unconditionally call resize, because in case of the same size it does nothing.

Do you mean something like: candidates.resize(std::min(candidates.size(), tokens_to_generate - 1)); ?

text_generation/causal_lm/cpp/prompt_lookup_decoding_lm.cpp

Wovchena · 2024-04-26T10:36:11Z

text_generation/causal_lm/cpp/prompt_lookup_decoding_lm.cpp

+    return new_tensor;
+}
+
+void update_kv_cache(ov::InferRequest request, uint64_t seq_len_axis, uint64_t new_seq_len) {


@Wovchena , we have a proposal for optimizing kv cache trimm from @sammysun0711: sammysun0711@d7a24e5, based on parallel for. It could give 3x speed up for cache update. Should we apply it as well?

Is there a way to link with tbb from openvino package? @ilya-lavrenov, do you know a way? If yes, feel free to apply.

see custom operations as example https://github.com/openvinotoolkit/openvino_contrib/blob/master/modules/custom_operations/user_ie_extensions/CMakeLists.txt#L20

ov::parallel_for is used there

@ilya-lavrenov , @Wovchena , Don't you mind if I address remaining comments in the next PRs?

Apply parallel_for optimization for trim tensor

Apply optimized trim tensor implementation for speculative_decoding

Investigate candidates_size + 1 inference for speculative_decoding

text_generation/causal_lm/cpp/README.md

as-suvorov added 3 commits April 23, 2024 19:00

Add prompt lookup decoding

cbba5a7

Fix comments

8e2acfe

Merge remote-tracking branch 'upstream/master' into as/prompt_lookup_…

401c686

…decoding

as-suvorov requested a review from Wovchena April 23, 2024 17:16

Wovchena requested changes Apr 24, 2024

View reviewed changes

Improve trimm tokenizer implementation

7dea874

Wovchena reviewed Apr 24, 2024

View reviewed changes

text_generation/causal_lm/cpp/prompt_lookup_decoding.cpp Outdated Show resolved Hide resolved

as-suvorov marked this pull request as draft April 24, 2024 11:48

as-suvorov added 3 commits April 25, 2024 11:30

Add test

581004a

Set max sequence length 100

183e34a

Apply comments

568b396

as-suvorov requested a review from Wovchena April 25, 2024 10:22

as-suvorov marked this pull request as ready for review April 25, 2024 10:22

Merge branch 'master' into as/prompt_lookup_decoding

d4ff99b

as-suvorov added 2 commits April 25, 2024 12:40

Update READ.ME

1e2819f

Merge remote-tracking branch 'origin/as/prompt_lookup_decoding' into …

0829042

…as/prompt_lookup_decoding

ilya-lavrenov reviewed Apr 25, 2024

View reviewed changes

as-suvorov added 2 commits April 25, 2024 14:30

Fix typo

4dbd7ee

Cut redundant token on last iteration

f8503fc

as-suvorov requested a review from ilya-lavrenov April 25, 2024 15:45

ilya-lavrenov reviewed Apr 25, 2024

View reviewed changes

Wovchena reviewed Apr 26, 2024

View reviewed changes

as-suvorov added 2 commits April 26, 2024 13:38

Apply comments

ad5196e

Merge branch 'master' into as/prompt_lookup_decoding

f493534

ilya-lavrenov approved these changes Apr 26, 2024

View reviewed changes

Wovchena approved these changes Apr 29, 2024

View reviewed changes

text_generation/causal_lm/cpp/README.md Outdated Show resolved Hide resolved

Update text_generation/causal_lm/cpp/README.md

b4efe9e

Wovchena merged commit 27083bd into openvinotoolkit:master Apr 29, 2024
11 checks passed

sammysun0711 mentioned this pull request May 6, 2024

Add prompt lookup decoding sample #378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt lookup decoding #379

Add prompt lookup decoding #379

as-suvorov commented Apr 23, 2024 •

edited

Loading

as-suvorov commented Apr 25, 2024

ilya-lavrenov Apr 25, 2024

as-suvorov Apr 26, 2024

Wovchena Apr 26, 2024

ilya-lavrenov Apr 26, 2024

as-suvorov Apr 26, 2024

Add prompt lookup decoding #379

Add prompt lookup decoding #379

Conversation

as-suvorov commented Apr 23, 2024 • edited Loading

as-suvorov commented Apr 25, 2024

ilya-lavrenov Apr 25, 2024

Choose a reason for hiding this comment

as-suvorov Apr 26, 2024

Choose a reason for hiding this comment

Wovchena Apr 26, 2024

Choose a reason for hiding this comment

ilya-lavrenov Apr 26, 2024

Choose a reason for hiding this comment

as-suvorov Apr 26, 2024

Choose a reason for hiding this comment

as-suvorov commented Apr 23, 2024 •

edited

Loading