API to run a language model in a confidential VM
By default all dependencies will be installed with the app. If you don't need CUDA support, you can save space by installing PyTorch for CPU only:
pip install --extra-index-url https://download.pytorch.org/whl/cpu -r requirements.txt
To use Intel AVX/AMX extensions:
pip install intel-extension-for-pytorch
python -m build
pip install dist/*.whl
CONFIG_PATH="./tests/config.json" python tests/test.py
Be sure to have ~/.local/bin
in your PATH
CONFIG_PATH="./tests/config.json" cosmian-ai-runner
The config is written as a JSON file, with different parts:
Optional part to fill the identity providers information.
If not auth
information is present in the config file, the authentication will be disabled.
It should contain a list of the following fields per identity providers:
-
jwks_uri
: identity provider's JSON Web Key Set -
client_id
: ID of the client calling this application set by the identity provider
Information about the summarization models to use and generation parameters.
Different models can be used depending on the language of the document to summarize.
It is mandatory to have at least a default
model entry.
We recommend to use facebook/bart-large-cnn
(400M parameters).
You can specify a custom generation_config
, you can see the default one for your model on HuggingFace: generation_config.json.
You can find more information about text generation here.
Information about the translation model to use and generation parameters.
We recommend to use facebook/nllb-200-distilled-600M
(600M parameters).
{
"auth": {
"openid_configs": [
{
"client_id": "XXXX",
"jwks_uri": "XXXX"
}
]
},
"summary": {
"default": {
"model_name": "facebook/bart-large-cnn",
"generation_config": {
"max_length": 140,
"min_length": 30
}
}
},
"translation": {
"model_name": "facebook/nllb-200-distilled-600M",
"generation_config": {
"max_length": 200
}
}
}