support max_num_workers in opencompass (#69)

ModelTC · Sep 5, 2024 · 14fba51 · 14fba51
1 parent 899b645
commit 14fba51
Show file tree

Hide file tree

Showing 4 changed files with 56 additions and 6 deletions.
diff --git a/configs/quantization/Awq/awq_w4a16_fakequant_eval_opencompass.yml b/configs/quantization/Awq/awq_w4a16_fakequant_eval_opencompass.yml
@@ -34,4 +34,5 @@ save:
     save_path: ./save
 opencompass:
     cfg_path: opencompass config path # eval_base.py for base model, eval_chat.py for chat model. configs can be found in llmc/configs/opencompass.
+    max_num_workers: 1
     output_path: ./oc_output
diff --git a/docs/en/source/advanced/model_test_v2.md b/docs/en/source/advanced/model_test_v2.md
@@ -49,15 +49,37 @@ opencompass:
     output_path: ./oc_output
 ```
 
-The cfg_path under opencompass needs to point to a configuration path for opencompass.
+<font color=792ee5> The cfg_path in opencompass needs to point to a configuration path for opencompass. </font>
 
 [Here](https://github.com/ModelTC/llmc/tree/main/configs/opencompass), we have provided the configurations for both the base model and the chat model regarding the human-eval test as a reference for everyone.
 
 It is important to note that [the configuration provided by opencompass](https://github.com/ModelTC/opencompass/blob/opencompass-llmc/configs/models/hf_llama/hf_llama3_8b.py) needs to have the path key. However, in this case, we do not need this key because llmc will default to using the model path in the save path of trans
 
 Of course, since the save path of trans model is required, you need to set save_trans to True if you want to test in opencompass.
 
-The output_path under opencompass is used to set the output directory for the evaluation logs of opencompass.
+<font color=792ee5> The max_num_workers in opencompass refers to the maximum number of inference instances. </font>
+
+If the model is running on a single GPU, then max_num_workers refers to the number of inference instances to be started, meaning it will occupy max_num_workers number of GPUs.
+
+If the model is running on multiple GPUs, as in the case of multi-GPU parallel testing (as mentioned below), for example, if the model is running inference on 2 GPUs, then max_num_workers refers to the number of inference instances to be started, meaning it will occupy 2 * max_num_workers number of GPUs.
+
+In summary, the required number of GPUs = number of PP (pipeline parallelism) * max_num_workers.
+
+If the required number of GPUs exceeds the actual number of available GPUs, then some workers will have to wait in a queue.
+
+max_num_workers not only starts multiple inference instances but also splits each dataset into max_num_workers parts, which can be understood as data parallelism.
+
+Therefore, the optimal setting is to make the required number of GPUs equal to the number of available GPUs.
+
+For example:
+
+On a machine with 8 GPUs, if a model runs on a single GPU, then max_num_workers=8.
+On a machine with 8 GPUs, if a model runs on 4 GPUs, then max_num_workers=2.
+We should try to lower the number of PPs while increasing max_num_workers, because PP parallelism tends to be slower. PP should only be used when the model cannot run on a single GPU, such as for a 70B model that cannot run on a single GPU. In this case, we can set PP=4 and use four 80GB GPUs to run it.
+
+<font color=792ee5> The output_path in opencompass is used to set the output directory for the evaluation logs of opencompass. </font>
+
+In this log directory, OpenCompass will output logs for inference and evaluation, detailed inference results, and the final evaluation accuracy.
 
 Before running the llmc program, you also need to install the version of [opencompass](https://github.com/ModelTC/opencompass/tree/opencompass-llmc) that has been adapted for llmc.
 

diff --git a/docs/zh_cn/source/advanced/model_test_v2.md b/docs/zh_cn/source/advanced/model_test_v2.md
@@ -46,18 +46,43 @@ save:
     save_path: ./save
 opencompass:
     cfg_path: opencompass config path
+    max_num_workers: max num works
     output_path: ./oc_output
 ```
 
-opencompass下的cfg_path，需要指向一个opencompass的config路径
+<font color=792ee5> opencompass下的cfg_path，需要指向一个opencompass的config路径 </font>
 
 我们在[这里](https://github.com/ModelTC/llmc/tree/main/configs/opencompass)分别给出了base模型和chat模型的关于human-eval测试的config，作为给大家的参考。
 
 需要注意的是[opencompass自带的config](https://github.com/ModelTC/opencompass/blob/opencompass-llmc/configs/models/hf_llama/hf_llama3_8b.py)中，需要有path这个key，而这里我们不需要这个key，因为llmc会默认模型的路径在trans的save路径。
 
 当然，因为需要trans的save路径，所以想测试opencompass，就需要设置save_trans为True
 
-opencompass下的output_path，是设置opencompass的评测日志的输出目录
+<font color=792ee5> opencompass下的max_num_workers，表示最大的推理实例数 </font>
+
+假设模型是在单卡上跑的，那么max_num_workers就是表示，要起max_num_workers个推理实例，即占用了max_num_workers张卡。
+
+假设模型是在多卡上跑的，即参考下面的多卡并行测试，举例如果模型是在2张卡上进行推理，那么max_num_workers就是表示，要起max_num_workers个推理实例，即占用了2*max_num_workers张卡。
+
+综上，所需占用的卡数 = PP数 * max_num_workers
+
+如果所需占用的卡数超过实际中的卡数，那么就会有worker排队情况。
+
+max_num_workers不仅会起多个推理实例，还会把每个数据集进行切分成max_num_workers份，可以理解成是数据并行。
+
+所以：最佳的设置方案就是，让所需占用的卡数=实际可用的卡数。
+
+比如：
+
+在一个8卡机器上，某个模型，用单卡跑，则max_num_workers=8
+
+在一个8卡机器上，某个模型，用四卡跑，则max_num_workers=2
+
+我们尽量让PP数降低，让max_num_workers提高。因为PP并行会变慢，PP仅用在模型实在跑不了的情况，比如70B模型，单卡跑不了，我们就可以设置PP=4，用4个80G显存的卡去跑。
+
+<font color=792ee5> opencompass下的output_path，是设置opencompass的评测日志的输出目录 </font>
+
+在该日志目录中，opencompass会输出推理和评测的日志，推理的具体结果，评测最终的精度等。
 
 在运行llmc程序之前，还需要安装做了[llmc适配的opencompass](https://github.com/ModelTC/opencompass/tree/opencompass-llmc)
 
@@ -74,7 +99,7 @@ pip install human-eval
 
 ## 多卡并行测试
 
-如果模型太大，单卡评测放不下，需要使用多卡评测精度，我们支持在运行opencompass时使用pipeline parallel。
+如果模型太大，单卡评测放不下，需要使用多卡评测精度，我们支持在运行opencompass时使用pipeline parallel，即PP并行。
 
 你需要做的仅仅就是：
 

diff --git a/llmc/__main__.py b/llmc/__main__.py
@@ -117,13 +117,15 @@ def main(config):
     if 'opencompass' in config:
         assert config.save.get('save_trans', False)
         cfg_path = config['opencompass']['cfg_path']
+        max_num_workers = config['opencompass']['max_num_workers']
         output_path = config['opencompass']['output_path']
         eval_model_path = os.path.abspath(save_trans_path)
         opencompass_cmd = (
             f'opencompass {cfg_path} -w {output_path} '
             f'--llmc_cfg {args.config} '
             f'--llmc_eval_mode quant '
-            f'--llmc_model_path {eval_model_path}'
+            f'--llmc_model_path {eval_model_path} '
+            f'--max-num-workers {max_num_workers}'
         )
         logger.info(f'opencompass_cmd : {opencompass_cmd}')
         os.system(opencompass_cmd)