Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update neo sharding script to use load_model_for_sharding #2556

Merged
merged 25 commits into from
Nov 14, 2024
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
22e5950
add lora for neo fast model loading test
HappyAmazonian Nov 11, 2024
318bcba
change fml name
HappyAmazonian Nov 11, 2024
032d1ae
add print
HappyAmazonian Nov 11, 2024
8a5ebec
add fml option
HappyAmazonian Nov 11, 2024
7f7ad6f
fix adapter call
HappyAmazonian Nov 11, 2024
5505520
no adapter
HappyAmazonian Nov 12, 2024
09ae5c8
x
HappyAmazonian Nov 12, 2024
0394edb
comment out lora env
HappyAmazonian Nov 12, 2024
56bbf2f
try tiny llama
HappyAmazonian Nov 12, 2024
fe0401f
enable
HappyAmazonian Nov 12, 2024
0579c6a
add adapter in request
HappyAmazonian Nov 12, 2024
9cc8a48
Merge branch 'deepjavalibrary:master' into neo-lora
HappyAmazonian Nov 12, 2024
fab8231
add tiny-llama-lora-fml
HappyAmazonian Nov 12, 2024
e660825
Merge branch 'deepjavalibrary:master' into neo-lora
HappyAmazonian Nov 13, 2024
6b7206e
add lora config for neo sharding
HappyAmazonian Nov 13, 2024
f25c1c0
remove extra formatting
HappyAmazonian Nov 13, 2024
2a6655c
only add lora config for neo shard script when enable lora = true
HappyAmazonian Nov 14, 2024
f3cd52c
fix
HappyAmazonian Nov 14, 2024
62d7d56
improve style
HappyAmazonian Nov 14, 2024
85dc854
improve style
HappyAmazonian Nov 14, 2024
e9731be
improve style
HappyAmazonian Nov 14, 2024
b8f6655
improve style
HappyAmazonian Nov 14, 2024
df864cd
fix max_cpu_loras type
HappyAmazonian Nov 14, 2024
714066f
Merge https://github.com/HappyAmazonian/djl-serving into neo-lora
HappyAmazonian Nov 14, 2024
d9f2c2e
update neo sharding script to use load_model_for_sharding
HappyAmazonian Nov 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions serving/docker/partition/sm_neo_shard.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from lmi_dist.init_engine import engine_from_args
from lmi_dist.arg_utils import VllmEngineArgs
from lmi_dist.comms import comms
from lmi_dist.vllm_engine import load_model_for_sharding

CHUNK_MB = 8

Expand Down Expand Up @@ -156,12 +157,13 @@ def shard_lmi_dist_model(self, input_dir: str, output_dir: str,
**lora_kwargs,
)

engine = engine_from_args(engine_args)
engine_configs = engine_args.create_engine_configs()
engine_worker = load_model_for_sharding(engine_configs)

model_dir = os.path.join(output_dir, sm_fml.MODEL_DIR_NAME)
os.makedirs(model_dir, exist_ok=True)

config_for_current_rank = engine.model_runner.vllm_worker.save_chunked_shard(
config_for_current_rank = engine_worker.save_chunked_shard(
output_dir=model_dir, chunk_mb=chunk_mb)

# Gather results from all ranks to driver process
Expand Down
Loading