Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix save #29

Merged
merged 1 commit into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ To help users design their configs, we now explain some universal configurations
# utilize naive quantization to the transformed model to obtain the same performance as
# the specifical-algorithm-quantized model.
save_trans: False
# ``save_lightllm`` is True, which means you want to export a real quant model, e.g.,
# ``save_lightllm`` or ``save_trtllm`` is True, which means you want to export a real quant model, e.g.,
# low-bit weights with weight and activation quantization parameters.
save_lightllm: False
# ``save_fake`` is True means you want to export fake_quant model, e.g.,
Expand Down
2 changes: 1 addition & 1 deletion README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
save:
# ``save_trans``がTrueの場合、変換モデル(例:パラメータが変更されたモデル)をエクスポートしたいことを意味します。パフォーマンスと構造は元のモデルと同じであり、ユーザーは単純な量子化を使用して、特定のアルゴリズムで量子化されたモデルと同じパフォーマンスを得ることができます。
save_trans: False
# ``save_lightllm``がTrueの場合、実際の量子化モデル(例:低ビットの重みと重みおよびアクティベーションの量子化パラメータ)をエクスポートしたいことを意味します。
# ``save_lightllm``または ``save_trtllm`` がTrueの場合、実際の量子化モデル(例:低ビットの重みと重みおよびアクティベーションの量子化パラメータ)をエクスポートしたいことを意味します。
save_lightllm: False
# ``save_fake``がTrueの場合、偽量子化モデル(例:量子化解除された重みとアクティベーションの量子化パラメータ)をエクスポートしたいことを意味します。
save_fake: False
Expand Down
2 changes: 1 addition & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
save:
# 如果``save_trans``为 True,这意味着你想要导出转换模型,例如,参数修改的模型,其性能和结构与原始模型相同,用户可以对转换模型进行简单量化,以获得与特定算法量化模型相同的性能。
save_trans: False
# 如果``save_lightllm``为 True,这意味着你想要导出真实的量化模型,例如,低位权重和权重及激活量化参数。
# 如果``save_lightllm`` 或者 ``save_trtllm`` 为 True,这意味着你想要导出真实的量化模型,例如,低位权重和权重及激活量化参数。
save_lightllm: False
# 如果``save_fake``为 True,意味着你想要导出假量化模型,例如,去量化的权重和激活量化参数。
save_fake: False
Expand Down
42 changes: 21 additions & 21 deletions llmc/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,27 +71,27 @@ def main(config):
)
blockwise_opt.run_block_loop()

if 'eval' in config and 'transformed' in config.eval.eval_pos:
blockwise_opt.deploy('origin_float')
for ppl_eval in eval_list:
ppl = ppl_eval.eval(model)
logger.info(f'{ppl_eval.dataset} ppl : {ppl}')

if 'cvt' in config and config.get('cvt', True):
blockwise_opt.run_block_cvt()

if 'save' in config and config.save.get('save_trans', False):
blockwise_opt.save_model(save_trans_path)

if 'save' in config and config.save.get('save_trtllm', False):
blockwise_opt.save_model(save_trtllm_trans_path)
from llmc.utils.export_trtllm import cvt_trtllm_engine

cvt_trtllm_engine(
save_trtllm_trans_path,
save_trtllm_engine_path,
config.save.get('trtllm_cfg'),
)
if 'eval' in config and 'transformed' in config.eval.eval_pos:
blockwise_opt.deploy('origin_float')
for ppl_eval in eval_list:
ppl = ppl_eval.eval(model)
logger.info(f'{ppl_eval.dataset} ppl : {ppl}')

if 'cvt' in config and config.get('cvt', True):
blockwise_opt.run_block_cvt()

if 'save' in config and config.save.get('save_trans', False):
blockwise_opt.save_model(save_trans_path)

if 'save' in config and config.save.get('save_trtllm', False):
blockwise_opt.save_model(save_trtllm_trans_path)
from llmc.utils.export_trtllm import cvt_trtllm_engine

cvt_trtllm_engine(
save_trtllm_trans_path,
save_trtllm_engine_path,
config.save.get('trtllm_cfg'),
)

if 'eval' in config and 'fake_quant' in config.eval.eval_pos:
blockwise_opt.deploy('fake_quant')
Expand Down
Loading