[lmi] log warnings for unused generation parameters across all rollin… #1686

siddvenk · 2024-03-28T00:03:51Z

…g batch backends instead of throwing errors

Description

This PR standardizes the behavior across all rolling batch backends to log warnings for unused generation parameters. Before, some backends would throw errors, and some would succeed.

I have also logged the unused parameters, and the supported set of parameters. The supported set of parameters does include some stuff that the user should not set, so it's not perfect. I'm ok to remove that portion for now and just log the unused params to the warning.

Testing Samples

Using the following request:

curl -X POST http://localhost:8080/invocations -H 'Content-Type: application/json' -d '{"inputs": "Hello there", "parameters": {"badkwarg": 4}, "stream": false}'

vllm (would fail before)

response: {"generated_text": ", I’m a 20 year old guy from the UK. I’ve been playing guitar for about 10 years now and"}
logs: The following parameters are not supported by vllm: {'badkwarg'}. The supported parameters are {'include_stop_str_in_output', 'length_penalty', 'min_p', 'presence_penalty', 'temperature', 'prompt_logprobs', 'best_of', 'skip_special_tokens', 'early_stopping', 'stop_token_ids', 'repetition_penalty', 'use_beam_search', 'logprobs', 'logits_processors', 'top_k', 'stop', 'top_p', 'ignore_eos', 'n', 'frequency_penalty', 'seed', 'max_tokens', 'spaces_between_special_tokens'}

lmi-dist (would fail before)

response: {"generated_text": ", I’m a 20 year old guy from the UK. I’ve been playing guitar for about 10 years now and"}
logs: The following parameters are not supported by lmi-dist: {'badkwarg'}. The supported parameters are {'typical_p', 'max_tokens', 'truncate', 'ignore_eos', 'length_penalty', 'top_p', 'repetition_penalty', 'best_of', 'include_stop_str_in_output', 'stop_token_ids', 'top_k', 'prompt_logprobs', 'use_beam_search', 'spaces_between_special_tokens', 'logprobs', 'stop', 'skip_special_tokens', 'seed', 'frequency_penalty', 'temperature', 'min_p', 'logits_processors', 'sampling_params', 'presence_penalty', 'n', 'early_stopping'}

hf-accelerate w/scheduler

response: {"generated_text": ", I’m a 20 year old guy from the UK. I’ve been playing guitar for about 10 years now and"}
logs: The following parameters are not supported by huggingface accelerate rolling batch: {'seed', 'badkwarg'}. The supported parameters are {'topp', 'eos_token_id', 'use_lru_kv_cache', 'sampling', 'topk', 'alpha', '_max_seqlen', 'beam', 'max_new_seqlen', 'temperature', 'pad_token_id'}

deepspeed

response: {"generated_text": ", I’m a newbie here. I’m a 20 year old guy from the UK. I’ve been a fan"}
logs: The following parameters are not supported by deepspeed: {'badkwarg'}. The supported parameters are {'watermark', 'repetition_penalty', 'temperature', 'max_new_tokens', 'top_p', 'do_sample', 'seed', 'typical_p', 'stop_sequences', 'top_k', 'ignore_eos_token'}

tnx

response: {"generated_text": ", I’m a newbie here. I’m a 20 year old guy from the UK. I’ve been a fan"}
logs: The following parameters are not supported by neuron: {'badkwarg'}. The supported parameters are {'epsilon_cutoff', 'bos_token_id', 'begin_suppress_tokens', 'num_assistant_tokens_schedule', 'forced_eos_token_id', 'do_sample', '_commit_hash', 'max_new_tokens', 'output_attentions', 'num_assistant_tokens', 'encoder_no_repeat_ngram_size', 'use_cache', 'length_penalty', 'low_memory', 'min_length', 'encoder_repetition_penalty', 'output_scores', 'exponential_decay_length_penalty', 'return_dict_in_generate', '_from_model_config', 'num_beam_groups', 'decoder_start_token_id', 'pad_token_id', 'top_k', 'repetition_penalty', 'top_p', 'num_beams', 'force_words_ids', 'forced_decoder_ids', 'num_return_sequences', 'diversity_penalty', 'sequence_bias', 'min_new_tokens', 'renormalize_logits', 'guidance_scale', 'temperature', 'remove_invalid_values', 'eos_token_id', 'early_stopping', 'no_repeat_ngram_size', 'generation_kwargs', 'constraints', 'bad_words_ids', 'output_hidden_states', 'eta_cutoff', 'transformers_version', 'penalty_alpha', 'max_time', 'max_length', 'forced_bos_token_id', 'suppress_tokens', 'typical_p'}

optimum:

response: {"generated_text": ", I’m a newbie here. I’m a 20 year old guy from the UK. I’ve been a fan"}
logs: The following parameters are not supported by neuron: {'badkwarg'}. The supported parameters are {'forced_eos_token_id', 'penalty_alpha', 'num_beams', 'forced_decoder_ids', 'sequence_bias', 'output_hidden_states', 'typical_p', 'pad_token_id', 'temperature', '_commit_hash', 'bad_words_ids', 'top_p', 'suppress_tokens', 'generation_kwargs', 'constraints', 'exponential_decay_length_penalty', 'begin_suppress_tokens', 'length_penalty', 'num_return_sequences', 'min_length', 'encoder_no_repeat_ngram_size', 'force_words_ids', 'max_time', 'num_beam_groups', 'low_memory', 'no_repeat_ngram_size', 'top_k', 'guidance_scale', 'max_new_tokens', 'diversity_penalty', 'encoder_repetition_penalty', 'do_sample', 'max_length', 'decoder_start_token_id', 'output_attentions', 'min_new_tokens', 'early_stopping', 'output_scores', 'num_assistant_tokens_schedule', 'repetition_penalty', 'forced_bos_token_id', 'bos_token_id', 'eos_token_id', 'use_cache', 'renormalize_logits', 'remove_invalid_values', 'return_dict_in_generate', 'num_assistant_tokens', '_from_model_config', 'eta_cutoff', 'transformers_version', 'epsilon_cutoff'}

…g batch backends instead of throwing errors

[lmi] log warnings for unused generation parameters across all rollin…

e6cd21f

…g batch backends instead of throwing errors

siddvenk requested review from zachgk, frankfliu and a team as code owners March 28, 2024 00:03

lanking520 approved these changes Mar 28, 2024

View reviewed changes

siddvenk merged commit c0d6e23 into deepjavalibrary:master Mar 28, 2024
8 checks passed

siddvenk deleted the generation-params branch March 28, 2024 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lmi] log warnings for unused generation parameters across all rollin… #1686

[lmi] log warnings for unused generation parameters across all rollin… #1686

siddvenk commented Mar 28, 2024

[lmi] log warnings for unused generation parameters across all rollin… #1686

[lmi] log warnings for unused generation parameters across all rollin… #1686

Conversation

siddvenk commented Mar 28, 2024

Description

Testing Samples