Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mixtral] Use col major for MoE gemm and use small batch specialization #171

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

vinx13
Copy link

@vinx13 vinx13 commented Jan 24, 2024

Changed MoE gemm to use more performant col major layouts. Also cleaned up param loading hack for mixtral that previously used PyTorch for fast weight transposition.

Benchmark: https://docs.google.com/spreadsheets/d/1w1uYvBK9bZluue4uyZjgT_4e8tl0rQDfMZ2XQfg43f4/edit#gid=0

need the commit https://github.com/octoml/tvm/commit/7fb6b704879f5accaf3ff08c96409123a02b8f58

cc @sunggg @masahi

@masahi masahi merged commit f1bc68f into octoml:batch-serving Jan 24, 2024
1 check passed
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this pull request Jan 30, 2024
This PR introduces the model reload functionality to the chat module
and the CLI.
* In CLI, the usage is `/reload [model_id]`, for example `model_id` can
be `vicuna-v1-7b-q3f16_0`. When `model_id` is specified, CLI will search
the corresponding model for `model_id` in candidate paths, and load the
specified model. When it is not specified, CLI will reload the currently
running model again.
* The effect of the reload operation is to load the (specified) model as
if loading it at the very beginning. This means the chat module will
completely release and forget everything happened before reload.
* Regarding the implementation, this PR unifies the previous `init_chat`
function with the `Init` function into the `Reload` function.
* Verified on Mac Studio and Linux that the reload operation will not
cause memory leak.
* Given the iOS/Android app does not support reading json config string
yet, the refactor of this PR keeps some existing functions that required
by the iOS/Android app as legacy functions. TODO items are marked where
legacy functions are. I have verified that the legacy functions work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants