Skip to content

Commit

Permalink
address PR comments
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeantonio21 committed May 2, 2024
1 parent 3b08580 commit 53225ad
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,18 @@ models = [MODEL_CONFIG] # Specifications for each model the user wants to operat
tracing = TRACING # bool value, allows for tracing
```

4. In the above point, the `MODEL_CONFIG` refers to a set of supported model configurations, see below, as follows
4. In the above paragraph, the `MODEL_CONFIG` refers to a set of supported model configurations, see below, as follows

```
[DEVICE, PRECISION, MODEL_TYPE, USE_FLASH_ATTENTION]
```

where `DEVICE` is an integer referring to the device index of GPU operating (if your machine only supports one single GPU cards, device should be 0, or if using cpu or metal devices). `PRECISION` refers to the model inference precision, supported values are
`"f32"`, `"bf16"`, `"f16"` (if you host quantized models, this field is not relevant). `MODEL_TYPE` is a string referring to the name
of the model to be hosted (on the given device), a full list of model names can be found here. `USE_FLASH_ATTENTION` is a boolean value which allows to run inference with the optimized flash attention algorithm.
- `DEVICE` is an integer referring to the device index of GPU operating (if your machine only supports one single GPU cards, device should be 0, or if using cpu or metal devices).
- `PRECISION` refers to the model inference precision, supported values are
`"f32"`, `"bf16"`, `"f16"` (if you host quantized models, this field is not relevant).
- `MODEL_TYPE` is a string referring to the name
of the model to be hosted (on the given device), a full list of model names can be found here.
- `USE_FLASH_ATTENTION` is a boolean value which allows to run inference with the optimized flash attention algorithm.

5. The event subscriber service configuration file is specified as (in toml format):

Expand Down

0 comments on commit 53225ad

Please sign in to comment.