- Linux
- macOS
通过Git克隆llama.cpp仓库:
git clone https://github.com/ggerganov/llama.cpp
进入llama.cpp目录并编译:
cd llama.cpp
make
-
创建模型存储路径
cd llama.cpp/models mkdir Minicpm
-
下载MiniCPM pytorch模型 下载MiniCPM pytorch模型的所有文件,并保存到
llama.cpp/models/Minicpm
目录下。 -
修改转换脚本 检查
llama.cpp/convert-hf-to-gguf.py
文件中的_reverse_hf_permute
函数,如果发现如下代码:def _reverse_hf_permute(self, weights: Tensor, n_head: int, n_kv_head: int | None = None) -> Tensor: if n_kv_head is not None and n_head != n_kv_head: n_head //= n_kv_head
替换为:
@staticmethod def permute(weights: Tensor, n_head: int, n_head_kv: int | None): if n_head_kv is not None and n_head != n_head_kv: n_head = n_head_kv return (weights.reshape(n_head, 2, weights.shape[0] // n_head // 2, *weights.shape[1:]) .swapaxes(1, 2) .reshape(weights.shape)) def _reverse_hf_permute(self, weights: Tensor, n_head: int, n_kv_head: int | None = None) -> Tensor: if n_kv_head is not None and n_head != n_kv_head: n_head //= n_kv_head
-
安装依赖并转换模型
python3 -m pip install -r requirements.txt python3 convert-hf-to-gguf.py models/Minicpm/
完成以上步骤后,
llama.cpp/models/Minicpm
目录下将会有一个名为ggml-model-f16.gguf
的模型文件。
若下载的模型已经是量化格式,则跳过此步骤。
./llama-quantize ./models/Minicpm/ggml-model-f16.gguf ./models/Minicpm/ggml-model-Q4_K_M.gguf Q4_K_M
如果找不到llama-quantize
,可以尝试重新编译:
cd llama.cpp
make llama-quantize
使用量化后的模型进行推理:
./llama-cli -m ./models/Minicpm/ggml-model-Q4_K_M.gguf -n 128 --prompt "<用户>你知道openmbmb么<AI>"