diff --git a/README.md b/README.md index de72d601..2e9230d1 100644 --- a/README.md +++ b/README.md @@ -82,47 +82,47 @@ The following parameters are optional and can be set in the `.env` file: Check out the [Budget Manual](https://github.com/n3d1117/chatgpt-telegram-bot/discussions/184) for possible budget configurations. #### Additional optional configuration options -| Parameter | Description | Default value | -|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------| -| `ENABLE_QUOTING` | Whether to enable message quoting in private chats | `true` | -| `ENABLE_IMAGE_GENERATION` | Whether to enable image generation via the `/image` command | `true` | -| `ENABLE_TRANSCRIPTION` | Whether to enable transcriptions of audio and video messages | `true` | -| `ENABLE_TTS_GENERATION` | Whether to enable text to speech generation via the `/tts` | `true` | -| `ENABLE_VISION` | Whether to enable vision capabilities in supported models | `true` | -| `PROXY` | Proxy to be used for OpenAI and Telegram bot (e.g. `http://localhost:8080`) | - | -| `OPENAI_PROXY` | Proxy to be used only for OpenAI (e.g. `http://localhost:8080`) | - | -| `TELEGRAM_PROXY` | Proxy to be used only for Telegram bot (e.g. `http://localhost:8080`) | - | -| `OPENAI_MODEL` | The OpenAI model to use for generating responses. You can find all available models [here](https://platform.openai.com/docs/models/) | `gpt-3.5-turbo` | -| `OPENAI_BASE_URL` | Endpoint URL for unofficial OpenAI-compatible APIs (e.g., LocalAI or text-generation-webui) | Default OpenAI API URL | -| `ASSISTANT_PROMPT` | A system message that sets the tone and controls the behavior of the assistant | `You are a helpful assistant.` | -| `SHOW_USAGE` | Whether to show OpenAI token usage information after each response | `false` | -| `STREAM` | Whether to stream responses. **Note**: incompatible, if enabled, with `N_CHOICES` higher than 1 | `true` | -| `MAX_TOKENS` | Upper bound on how many tokens the ChatGPT API will return | `1200` for GPT-3, `2400` for GPT-4 | -| `VISION_MAX_TOKENS` | Upper bound on how many tokens vision models will return | `300` for gpt-4-vision-preview | -| `VISION_MODEL` | The Vision to Speech model to use. Allowed values: `gpt-4-vision-preview` | `gpt-4-vision-preview` | -| `ENABLE_VISION_FOLLOW_UP_QUESTIONS` | If true, once you send an image to the bot, it uses the configured VISION_MODEL until the conversation ends. Otherwise, it uses the OPENAI_MODEL to follow the conversation. Allowed values: `true` or `false` | `true` | -| `MAX_HISTORY_SIZE` | Max number of messages to keep in memory, after which the conversation will be summarised to avoid excessive token usage | `15` | -| `MAX_CONVERSATION_AGE_MINUTES` | Maximum number of minutes a conversation should live since the last message, after which the conversation will be reset | `180` | -| `VOICE_REPLY_WITH_TRANSCRIPT_ONLY` | Whether to answer to voice messages with the transcript only or with a ChatGPT response of the transcript | `false` | -| `VOICE_REPLY_PROMPTS` | A semicolon separated list of phrases (i.e. `Hi bot;Hello chat`). If the transcript starts with any of them, it will be treated as a prompt even if `VOICE_REPLY_WITH_TRANSCRIPT_ONLY` is set to `true` | - | -| `VISION_PROMPT` | A phrase (i.e. `What is in this image`). The vision models use it as prompt to interpret a given image. If there is caption in the image sent to the bot, that supersedes this parameter | `What is in this image` | -| `N_CHOICES` | Number of answers to generate for each input message. **Note**: setting this to a number higher than 1 will not work properly if `STREAM` is enabled | `1` | -| `TEMPERATURE` | Number between 0 and 2. Higher values will make the output more random | `1.0` | -| `PRESENCE_PENALTY` | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far | `0.0` | -| `FREQUENCY_PENALTY` | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far | `0.0` | -| `IMAGE_FORMAT` | The Telegram image receive mode. Allowed values: `document` or `photo` | `photo` | -| `IMAGE_MODEL` | The DALL·E model to be used. Available models: `dall-e-2` and `dall-e-3`, find current available models [here](https://platform.openai.com/docs/models/dall-e) | `dall-e-2` | -| `IMAGE_QUALITY` | Quality of DALL·E images, only available for `dall-e-3`-model. Possible options: `standard` or `hd`, beware of [pricing differences](https://openai.com/pricing#image-models). | `standard` | -| `IMAGE_STYLE` | Style for DALL·E image generation, only available for `dall-e-3`-model. Possible options: `vivid` or `natural`. Check availbe styles [here](https://platform.openai.com/docs/api-reference/images/create). | `vivid` | -| `IMAGE_SIZE` | The DALL·E generated image size. Must be `256x256`, `512x512`, or `1024x1024` for dall-e-2. Must be `1024x1024` for dall-e-3 models. | `512x512` | -| `VISION_DETAIL` | The detail parameter for vision models, explained [Vision Guide](https://platform.openai.com/docs/guides/vision). Allowed values: `low` or `high` | `auto` | -| `GROUP_TRIGGER_KEYWORD` | If set, the bot in group chats will only respond to messages that start with this keyword | - | -| `IGNORE_GROUP_TRANSCRIPTIONS` | If set to true, the bot will not process transcriptions in group chats | `true` | -| `IGNORE_GROUP_VISION` | If set to true, the bot will not process vision queries in group chats | `true` | +| Parameter | Description | Default value | +|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------| +| `ENABLE_QUOTING` | Whether to enable message quoting in private chats | `true` | +| `ENABLE_IMAGE_GENERATION` | Whether to enable image generation via the `/image` command | `true` | +| `ENABLE_TRANSCRIPTION` | Whether to enable transcriptions of audio and video messages | `true` | +| `ENABLE_TTS_GENERATION` | Whether to enable text to speech generation via the `/tts` | `true` | +| `ENABLE_VISION` | Whether to enable vision capabilities in supported models | `true` | +| `PROXY` | Proxy to be used for OpenAI and Telegram bot (e.g. `http://localhost:8080`) | - | +| `OPENAI_PROXY` | Proxy to be used only for OpenAI (e.g. `http://localhost:8080`) | - | +| `TELEGRAM_PROXY` | Proxy to be used only for Telegram bot (e.g. `http://localhost:8080`) | - | +| `OPENAI_MODEL` | The OpenAI model to use for generating responses. You can find all available models [here](https://platform.openai.com/docs/models/) | `gpt-3.5-turbo` | +| `OPENAI_BASE_URL` | Endpoint URL for unofficial OpenAI-compatible APIs (e.g., LocalAI or text-generation-webui) | Default OpenAI API URL | +| `ASSISTANT_PROMPT` | A system message that sets the tone and controls the behavior of the assistant | `You are a helpful assistant.` | +| `SHOW_USAGE` | Whether to show OpenAI token usage information after each response | `false` | +| `STREAM` | Whether to stream responses. **Note**: incompatible, if enabled, with `N_CHOICES` higher than 1 | `true` | +| `MAX_TOKENS` | Upper bound on how many tokens the ChatGPT API will return | `1200` for GPT-3, `2400` for GPT-4 | +| `VISION_MAX_TOKENS` | Upper bound on how many tokens vision models will return | `300` for gpt-4-vision-preview | +| `VISION_MODEL` | The Vision to Speech model to use. Allowed values: `gpt-4-vision-preview` | `gpt-4-vision-preview` | +| `ENABLE_VISION_FOLLOW_UP_QUESTIONS` | If true, once you send an image to the bot, it uses the configured VISION_MODEL until the conversation ends. Otherwise, it uses the OPENAI_MODEL to follow the conversation. Allowed values: `true` or `false` | `true` | +| `MAX_HISTORY_SIZE` | Max number of messages to keep in memory, after which the conversation will be summarised to avoid excessive token usage | `15` | +| `MAX_CONVERSATION_AGE_MINUTES` | Maximum number of minutes a conversation should live since the last message, after which the conversation will be reset | `180` | +| `VOICE_REPLY_WITH_TRANSCRIPT_ONLY` | Whether to answer to voice messages with the transcript only or with a ChatGPT response of the transcript | `false` | +| `VOICE_REPLY_PROMPTS` | A semicolon separated list of phrases (i.e. `Hi bot;Hello chat`). If the transcript starts with any of them, it will be treated as a prompt even if `VOICE_REPLY_WITH_TRANSCRIPT_ONLY` is set to `true` | - | +| `VISION_PROMPT` | A phrase (i.e. `What is in this image`). The vision models use it as prompt to interpret a given image. If there is caption in the image sent to the bot, that supersedes this parameter | `What is in this image` | +| `N_CHOICES` | Number of answers to generate for each input message. **Note**: setting this to a number higher than 1 will not work properly if `STREAM` is enabled | `1` | +| `TEMPERATURE` | Number between 0 and 2. Higher values will make the output more random | `1.0` | +| `PRESENCE_PENALTY` | Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far | `0.0` | +| `FREQUENCY_PENALTY` | Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far | `0.0` | +| `IMAGE_FORMAT` | The Telegram image receive mode. Allowed values: `document` or `photo` | `photo` | +| `IMAGE_MODEL` | The DALL·E model to be used. Available models: `dall-e-2` and `dall-e-3`, find current available models [here](https://platform.openai.com/docs/models/dall-e) | `dall-e-2` | +| `IMAGE_QUALITY` | Quality of DALL·E images, only available for `dall-e-3`-model. Possible options: `standard` or `hd`, beware of [pricing differences](https://openai.com/pricing#image-models). | `standard` | +| `IMAGE_STYLE` | Style for DALL·E image generation, only available for `dall-e-3`-model. Possible options: `vivid` or `natural`. Check availbe styles [here](https://platform.openai.com/docs/api-reference/images/create). | `vivid` | +| `IMAGE_SIZE` | The DALL·E generated image size. Must be `256x256`, `512x512`, or `1024x1024` for dall-e-2. Must be `1024x1024` for dall-e-3 models. | `512x512` | +| `VISION_DETAIL` | The detail parameter for vision models, explained [Vision Guide](https://platform.openai.com/docs/guides/vision). Allowed values: `low` or `high` | `auto` | +| `GROUP_TRIGGER_KEYWORD` | If set, the bot in group chats will only respond to messages that start with this keyword | - | +| `IGNORE_GROUP_TRANSCRIPTIONS` | If set to true, the bot will not process transcriptions in group chats | `true` | +| `IGNORE_GROUP_VISION` | If set to true, the bot will not process vision queries in group chats | `true` | | `BOT_LANGUAGE` | Language of general bot messages. Currently available: `en`, `de`, `ru`, `tr`, `it`, `fi`, `es`, `id`, `nl`, `zh-cn`, `zh-tw`, `vi`, `fa`, `pt-br`, `uk`, `ms`, `uz`, `ar`. [Contribute with additional translations](https://github.com/n3d1117/chatgpt-telegram-bot/discussions/219) | `en` | -| `WHISPER_PROMPT` | To improve the accuracy of Whisper's transcription service, especially for specific names or terms, you can set up a custom message. [Speech to text - Prompting](https://platform.openai.com/docs/guides/speech-to-text/prompting) | `-` | -| `TTS_VOICE` | The Text to Speech voice to use. Allowed values: `alloy`, `echo`, `fable`, `onyx`, `nova`, or `shimmer` | `alloy` | -| `TTS_MODEL` | The Text to Speech model to use. Allowed values: `tts-1` or `tts-1-hd` | `tts-1` | +| `WHISPER_PROMPT` | To improve the accuracy of Whisper's transcription service, especially for specific names or terms, you can set up a custom message. [Speech to text - Prompting](https://platform.openai.com/docs/guides/speech-to-text/prompting) | `-` | +| `TTS_VOICE` | The Text to Speech voice to use. Allowed values: `alloy`, `echo`, `fable`, `onyx`, `nova`, or `shimmer` | `alloy` | +| `TTS_MODEL` | The Text to Speech model to use. Allowed values: `tts-1` or `tts-1-hd` | `tts-1` | Check out the [official API reference](https://platform.openai.com/docs/api-reference/chat) for more details. @@ -151,6 +151,7 @@ Check out the [official API reference](https://platform.openai.com/docs/api-refe | `gtts_text_to_speech` | Text to speech (powered by Google Translate APIs) | - | `gtts` | | `whois` | Query the whois domain database - by [@jnaskali](https://github.com/jnaskali) | - | `whois` | | `webshot` | Screenshot a website from a given url or domain name - by [@noriellecruz](https://github.com/noriellecruz) | - | | +| `auto_tts` | Text to speech using OpenAI APIs - by [@Jipok](https://github.com/Jipok) | - | | #### Environment variables | Variable | Description | Default value |