forked from LostRuins/koboldcpp
-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/concedo'
- Loading branch information
Showing
58 changed files
with
3,420 additions
and
1,801 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ env: | |
|
||
jobs: | ||
windows: | ||
runs-on: windows-latest | ||
runs-on: windows-2019 | ||
steps: | ||
- name: Clone | ||
id: checkout | ||
|
@@ -17,16 +17,14 @@ jobs: | |
- uses: Jimver/[email protected] | ||
id: cuda-toolkit | ||
with: | ||
cuda: '11.7.1' | ||
method: 'network' | ||
sub-packages: '["nvcc", "cudart", "cublas", "cublas_dev", "thrust", "visual_studio_integration"]' | ||
|
||
cuda: '11.4.4' | ||
|
||
- name: Build | ||
id: cmake_build | ||
run: | | ||
mkdir build | ||
cd build | ||
cmake .. -DLLAMA_CUBLAS=ON | ||
cmake .. -DLLAMA_CUBLAS=ON -DCMAKE_SYSTEM_VERSION="10.0.19041.0" | ||
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS} | ||
- name: Save artifact | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
name: Python check requirements.txt | ||
|
||
on: | ||
push: | ||
paths: | ||
- 'scripts/check-requirements.sh' | ||
- 'convert*.py' | ||
- 'requirements.txt' | ||
- 'requirements/*.txt' | ||
pull_request: | ||
paths: | ||
- 'scripts/check-requirements.sh' | ||
- 'convert*.py' | ||
- 'requirements.txt' | ||
- 'requirements/*.txt' | ||
|
||
jobs: | ||
python-check-requirements: | ||
runs-on: ubuntu-latest | ||
name: check-requirements | ||
steps: | ||
- name: Check out source repository | ||
uses: actions/checkout@v3 | ||
- name: Set up Python environment | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.11" | ||
- name: Run check-requirements.sh script | ||
run: bash scripts/check-requirements.sh nocleanup |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# AWQ: Activation-aware Weight Quantization for LLM - version apply to llamacpp | ||
[[Paper](https://arxiv.org/abs/2306.00978)][[Original Repo](https://github.com/mit-han-lab/llm-awq)][[Easy-to-use Repo](https://github.com/casper-hansen/AutoAWQ)] | ||
|
||
**Supported models:** | ||
|
||
- [X] LLaMA | ||
- [x] LLaMA 2 | ||
- [X] MPT | ||
- [X] Mistral AI v0.1 | ||
- [ ] Bloom | ||
- [ ] Mixtral MoE | ||
|
||
**TODO:** | ||
- [x] Update version work with both MPT and MPT-AWQ model | ||
- [ ] Add OPT model | ||
- [ ] Add Bloom model | ||
- [ ] Add Mixtral MoE | ||
- [ ] Support w3, w2 | ||
|
||
|
||
## Contents | ||
|
||
- [Install](##Install) | ||
- [Convert](##Convert) | ||
- [Quantize](##Quantize) | ||
- [Test](##Test) | ||
- [Benchmark](##Benchmark) | ||
- [Results](##Results) | ||
|
||
## Install | ||
Install requirements | ||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
Get the pre-computed AWQ search results for multiple model families, including LLaMA, LLaMA2, MPT, OPT | ||
```bash | ||
git clone https://huggingface.co/datasets/mit-han-lab/awq-model-zoo awq_cache | ||
``` | ||
|
||
## Convert | ||
Example for llama model | ||
```bash | ||
# For llama7b and llama2 models | ||
python convert.py models/llama-7b/ --awq-path awq_cache/llama-7b-w4-g128.pt --outfile models/llama_7b_fp16.gguf | ||
# For mistral and mpt models | ||
python convert-hf-to-gguf.py models/mpt-7b/ --awq-path awq_cache/llama-7b-w4-g128.pt --outfile models/mpt_7b_fp16.gguf | ||
``` | ||
|
||
## Quantize | ||
```bash | ||
# We only benchmark and confirm the results on q4_0, q4_1, and q2_k types. | ||
./quantize models/llama_7b_fp16.gguf models/llama_7b_q4_0.gguf q4_0 | ||
``` | ||
|
||
## Test | ||
```bash | ||
# For all models. | ||
./build/bin/main -m models/llama_7b_q4_0.gguf -n 128 --prompt "Once upon a time" | ||
``` | ||
|
||
## Benchmark | ||
The perplexity measurements in table above are done against the `wikitext2` test dataset (https://paperswithcode.com/dataset/wikitext-2), with context length of 512. | ||
```bash | ||
# For llama and llama2, and mistral models. | ||
./perplexity -m models/llama_7b_q4_0.gguf -f datasets/wikitext-2-raw/wiki.test.raw | ||
``` | ||
|
||
## Results | ||
Results are run on OpenBLAS (CPU) and CuBLAS (GPU) for fair comparison | ||
We use three types of llamacpp quantization methods to work with our version, including q4_0, q4_1, and q2_k | ||
|
||
### Llama 7B (Build with OpenBLAS) | ||
|
||
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | | ||
|-----------:|--------------|-------:|-------:|-------:|-------:| | ||
|Llama 7B | perplexity | 5.9066 | 6.1214 | 6.0643 | 6.5808 | | ||
|Llama 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G | | ||
|Llama 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|AWQ-LLama 7B| perplexity | 5.9175 | 6.0252 | 5.9987 | 6.3692 | | ||
|AWQ-LLama 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G | | ||
|AWQ-LLama 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|
||
|
||
### Llama2 7B (Build with CuBLAS) | ||
|
||
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | | ||
|------------:|--------------|-------:|-------:|-------:|-------:| | ||
|Llama2 7B | perplexity | 5.8664 | 6.0260 | 6.0656 | 6.4496 | | ||
|Llama2 7B | file size | 12.9G | 3.5G | 3.9G | 2.7G | | ||
|Llama2 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|AWQ-LLama2 7B| perplexity | 5.8801 | 6.0054 | 5.9849 | 6.3650 | | ||
|AWQ-LLama2 7B| file size | 12.9G | 3.5G | 3.9G | 2.7G | | ||
|AWQ-LLama2 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|
||
|
||
### Mistral 7B v0.1 (Build with CuBLAS) | ||
|
||
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | | ||
|-------------:|--------------|-------:|-------:|-------:|-------:| | ||
|Mistral 7B | perplexity | 5.6931 | 5.8202 | 5.8268 | 6.1645 | | ||
|Mistral 7B | file size | 14.5G | 4.1G | 4.5G | 3.1G | | ||
|Mistral 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|AWQ-Mistral 7B| perplexity | 5.6934 | 5.8020 | 5.7691 | 6.0426 | | ||
|AWQ-Mistral 7B| file size | 14.5G | 4.1G | 4.5G | 3.1G | | ||
|AWQ-Mistral 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|
||
### MPT 7B (Build with OpenBLAS) | ||
|
||
| Model | Measure | F16 | Q4_0 | Q4_1 | Q2_K | | ||
|---------:|--------------|-------:|-------:|-------:|--------:| | ||
|MPT 7B | perplexity | 8.4369 | 8.7956 | 8.6265 | 11.4913 | | ||
|MPT 7B | file size | 13.7G | 3.9G | 4.3G | 2.8G | | ||
|MPT 7B | bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | | ||
|AWQ-MPT 7B| perplexity | 8.4944 | 8.7053 | 8.6750 | 10.2873| | ||
|AWQ-MPT 7B| file size | 13.7G | 3.9G | 4.3G | 2.8G | | ||
|AWQ-MPT 7B| bits/weight | 16.0 | 4.5 | 5.0 | 2.6 | |
Oops, something went wrong.