add Lora config to arg list in Neo sharding script& its integ test change #2552

HappyAmazonian · 2024-11-13T21:33:51Z

Description

pass Lora config to arg list in Neo sharding script
also add tinyllama fast model loading for related integ code

siddvenk · 2024-11-13T22:03:58Z

serving/docker/partition/sm_neo_shard.py

        enforce_eager: bool = str(
            self.properties.get("option.enforce_eager",
-                                False)).lower() == "true"
+                                "False")).lower() == "true"


if we're changing to the string "False", we don't need to wrap this in a str() conversion.

Let's use the following:

enforce_eager = self.properties.get("option.enforce_eager", "true").lower() == "true"

I believe the default value should be true for this to keep in line with our existing behavior (i should have caught that in the original PR)

siddvenk · 2024-11-13T22:12:14Z

serving/docker/partition/sm_neo_shard.py

        max_rolling_batch_size = int(
            self.properties.get("option.max_rolling_batch_size", 256))
        max_model_len = self.properties.get("option.max_model_len", None)
        if max_model_len is not None:
            max_model_len = int(max_model_len)

+        # LoraConfigs
+        enable_lora: bool = str(


I think we should only be passing lora configs to the engine if the user specifies them. I'd prefer we rewrite this section as

lora_kwargs = {} if self.properties.get("option.enable_lora"): lora_kwargs["enable_lora"] = self.properties.get("option.enable_lora").lower() == "true" if self.properties.get("option.fully_sharded_loras"): lora_kwargs["fully_shareded_loras"] = option.get("option.fully_sharded_loras").lower() == "true" engine_args = VllmEngineArgs( <prior configs> **lora_kwargs ) ...

and so on. This way we're only providing the lora configs to the engine if they have been specified

HappyAmazonian and others added 16 commits November 11, 2024 20:06

add lora for neo fast model loading test

22e5950

change fml name

318bcba

add print

032d1ae

add fml option

8a5ebec

fix adapter call

7f7ad6f

no adapter

5505520

x

09ae5c8

comment out lora env

0394edb

try tiny llama

56bbf2f

enable

fe0401f

add adapter in request

0579c6a

Merge branch 'deepjavalibrary:master' into neo-lora

9cc8a48

add tiny-llama-lora-fml

fab8231

Merge branch 'deepjavalibrary:master' into neo-lora

e660825

add lora config for neo sharding

6b7206e

remove extra formatting

f25c1c0

HappyAmazonian requested review from zachgk and a team as code owners November 13, 2024 21:33

siddvenk reviewed Nov 13, 2024

View reviewed changes

HappyAmazonian added 7 commits November 14, 2024 01:09

only add lora config for neo shard script when enable lora = true

2a6655c

fix

f3cd52c

improve style

62d7d56

improve style

85dc854

improve style

e9731be

improve style

b8f6655

fix max_cpu_loras type

df864cd

siddvenk approved these changes Nov 14, 2024

View reviewed changes

siddvenk merged commit c5f1efc into deepjavalibrary:master Nov 14, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Lora config to arg list in Neo sharding script& its integ test change #2552

add Lora config to arg list in Neo sharding script& its integ test change #2552

HappyAmazonian commented Nov 13, 2024

siddvenk Nov 13, 2024

siddvenk Nov 13, 2024

add Lora config to arg list in Neo sharding script& its integ test change #2552

add Lora config to arg list in Neo sharding script& its integ test change #2552

Conversation

HappyAmazonian commented Nov 13, 2024

Description

siddvenk Nov 13, 2024

Choose a reason for hiding this comment

siddvenk Nov 13, 2024

Choose a reason for hiding this comment