After creating virtual environment and install langchain
:
- Run this command to use GPU version of LlamaCpp (require cmake-3.29.6 refer to this link):
CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
-
Access the llama guard 2 gguf modelfile and download it via this link. This model has been quantized to
Q4_K_M
for the ease of use. Or you can go to hugging face and look up for gguf-file models then copy the linkwget "the copied link"
Now, you are ready to run the test.py for demonstration!
Notice that the LlamaCpp implementation for llama guard 2 modelfile is from line 20 to line 73!