Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs update #837

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Mistal.rs supports several model categories:
- [Details](docs/QUANTS.md)
- GGML: 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit, with ISQ support.
- GPTQ: 2-bit, 3-bit, 4-bit and 8-bit
- HQQ: 4-bit and 8 bit, with ISQ support
- HQQ: 4-bit and 8-bit, with ISQ support

**Powerful**:
- LoRA support with weight merging
Expand Down Expand Up @@ -569,7 +569,7 @@ Mistral.rs will attempt to automatically load a chat template and tokenizer. Thi

## Contributing

Thank you for contributing! If you have any problems or want to contribute something, please raise an issue or pull request.
Thank you for contributing! If you have any problems or want to contribute something, please raise an issue or pull a request.
If you want to add a new model, please contact us via an issue and we can coordinate how to do this.

## FAQ
Expand All @@ -582,7 +582,7 @@ If you want to add a new model, please contact us via an issue and we can coordi
- Error: `recompile with -fPIE`:
- Some Linux distributions require compiling with `-fPIE`.
- Set the `CUDA_NVCC_FLAGS` environment variable to `-fPIE` during build: `CUDA_NVCC_FLAGS=-fPIE`
- Error `CUDA_ERROR_NOT_FOUND` or symbol not found when using a normal or vison model:
- Error `CUDA_ERROR_NOT_FOUND` or symbol not found when using a normal or vision model:
- For non-quantized models, you can specify the data type to load and run in. This must be one of `f32`, `f16`, `bf16` or `auto` to choose based on the device.

## Credits
Expand Down
4 changes: 2 additions & 2 deletions docs/ADAPTER_MODELS.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ An ordering JSON file for LoRA contains 2 major parts:
- Specifies the adapter name and the model ID to find them, which may be a local path.

### Preparing the ordering file (LoRA or X-LoRA cases)
There are 2 scripts to prepare the ordering file and which work for both X-LoRA and LoRA. The ordering file is specific to each architecture and set of target modules. Therefore, if either are changed, it is necessary to create a new ordering file using the first option. If only the adapter order or adapters changed, then it the second option should be used.
There are 2 scripts to prepare the ordering file and which work for both X-LoRA and LoRA. The ordering file is specific to each architecture and set of target modules. Therefore, if either is changed, it is necessary to create a new ordering file using the first option. If only the adapter order or adapters changed, then it the second option should be used.

1) From scratch: No ordering file for the architecture and target modules

Expand Down Expand Up @@ -102,4 +102,4 @@ To use this feature, you should add a `preload_adapters` key to your ordering fi

This allows mistral.rs to preload the adapter and enable runtime activation.

We also provide a script to add this key to your existing order file: [`load_add_preload_adapters.py`](../scripts/lora_add_preload_adapters.py).
We also provide a script to add this key to your existing order file: [`load_add_preload_adapters.py`](../scripts/lora_add_preload_adapters.py).
6 changes: 3 additions & 3 deletions docs/ANYMOE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Paper: https://arxiv.org/abs/2405.19076

https://github.com/EricLBuehler/mistral.rs/assets/65165915/33593903-d907-4c08-a0ac-d349d7bf33de

> Note: By default, this has the capability to create an csv loss image. When building from source (for Python or CLI), you may use `--no-default-features` command line to disable this. This may be necessary if networking is unavailable.
> Note: By default, this has the capability to create a csv loss image. When building from source (for Python or CLI), you may use `--no-default-features` command line to disable this. This may be necessary if networking is unavailable.

## Dataset
Currently, AnyMoE expects a JSON dataset with one top-level key `row`, which is an array of objects with keys `prompt` (string), `expert` (integer), and `image_urls` (optional array of strings). For example:
Expand All @@ -35,7 +35,7 @@ Currently, AnyMoE expects a JSON dataset with one top-level key `row`, which is
For a vision model, `image_urls` may contain an array of image URLs/local paths or Base64 encoded images.

## Experts
AnyMoE experts can be either fine-tuned models or LoRA adapter models. Only the mlp layers will be loaded from each. The experts must be homogeneous: they must be all fine-tuned or all adapter. Additionally, certain layers can be specified to apply AnyMoE.
AnyMoE experts can be either fine-tuned models or LoRA adapter models. Only the mlp layers will be loaded from each. The experts must be homogeneous: they must be all fine-tuned or all adapters. Additionally, certain layers can be specified to apply AnyMoE.

> Note: When using LoRA adapter experts, it may not be necessary to set the layers where AnyMoE will be applied due to the lower memory usage.

Expand Down Expand Up @@ -185,7 +185,7 @@ async fn main() -> Result<()> {
let messages = TextMessages::new()
.add_message(
TextMessageRole::System,
"You are an AI agent with a specialty in programming.",
"You are an AI agent with a speciality in programming.",
)
.add_message(
TextMessageRole::User,
Expand Down
4 changes: 2 additions & 2 deletions docs/IDEFICS2.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The Idefics 2 Model has support in the Rust, Python, and HTTP APIs. The Idefics 2 Model also supports ISQ for increased performance.

> Note: Some of examples use our [Cephalo model series](https://huggingface.co/collections/lamm-mit/cephalo-664f3342267c4890d2f46b33) but could be used with any model ID.
> Note: Some of the examples use our [Cephalo model series](https://huggingface.co/collections/lamm-mit/cephalo-664f3342267c4890d2f46b33) but could be used with any model ID.

The Python and HTTP APIs support sending images as:
- URL
Expand Down Expand Up @@ -183,4 +183,4 @@ print(res.usage)
```

- You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
6 changes: 3 additions & 3 deletions docs/LLaVA.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This implementation supports both LLaVA and LLaVANext(which adds multi resolutio
* llava-hf/llava-1.5-7b-hf


The LLaVA and LLaVANext Model has support in the Rust, Python, and HTTP APIs. The LLaVA and LLaVANext Model also supports ISQ for increased performance.
The LLaVA and LLaVANext Model have support in the Rust, Python, and HTTP APIs. The LLaVA and LLaVANext Models also support ISQ for increased performance.

The Python and HTTP APIs support sending images as:
- URL
Expand Down Expand Up @@ -101,7 +101,7 @@ print(resp)
## Rust
You can find this example [here](../mistralrs/examples/llava_next/main.rs).

This is a minimal example of running the LLaVA and LLaVANext model with a dummy image.
This is a minimal example of running the LLaVA and LLaVANext models with a dummy image.

```rust
use anyhow::Result;
Expand Down Expand Up @@ -192,4 +192,4 @@ print(res.usage)
```

- You can find an example of encoding the [image via base64 here](../examples/python/phi3v_base64.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
- You can find an example of loading an [image locally here](../examples/python/phi3v_local_img.py).
4 changes: 2 additions & 2 deletions docs/LORA_XLORA.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

- X-LoRA with no quantization

To start an X-LoRA server with the exactly as presented in [the paper](https://arxiv.org/abs/2402.07148):
To start an X-LoRA server exactly as presented in [the paper](https://arxiv.org/abs/2402.07148):

```bash
./mistralrs-server --port 1234 x-lora-plain -o orderings/xlora-paper-ordering.json -x lamm-mit/x-lora
Expand All @@ -15,4 +15,4 @@ To start an LoRA server with adapters from the X-LoRA paper (you should modify t
./mistralrs-server --port 1234 lora-gguf -o orderings/xlora-paper-ordering.json -m TheBloke/zephyr-7B-beta-GGUF -f zephyr-7b-beta.Q8_0.gguf -a lamm-mit/x-lora
```

Normally with a LoRA model you would use a custom ordering file. However, for this example we use the ordering from the X-LoRA paper because we are using the adapters from the X-LoRA paper.
Normally with a LoRA model, you would use a custom ordering file. However, for this example, we use the ordering from the X-LoRA paper because we are using the adapters from the X-LoRA paper.
Loading