diff --git a/docs/en/Core Concepts/Memory/intro.md b/docs/en/Core Concepts/Memory/intro.md deleted file mode 100644 index f8bfe38..0000000 --- a/docs/en/Core Concepts/Memory/intro.md +++ /dev/null @@ -1,3 +0,0 @@ -# Memory - -What is memory \ No newline at end of file diff --git a/docs/en/Core Concepts/Tool/intro.md b/docs/en/Core Concepts/Tool/intro.md deleted file mode 100644 index 6a1b21d..0000000 --- a/docs/en/Core Concepts/Tool/intro.md +++ /dev/null @@ -1,3 +0,0 @@ -# Tool - -What is tool \ No newline at end of file diff --git a/docs/en/Getting Started/1_quick_start.md b/docs/en/Getting Started/1_quick_start.md deleted file mode 100644 index d1a9cdb..0000000 --- a/docs/en/Getting Started/1_quick_start.md +++ /dev/null @@ -1,3 +0,0 @@ -# Quick Start - -How to install, set up, and run a simple chatbot. \ No newline at end of file diff --git a/docs/en/Getting Started/3_visual.md b/docs/en/Getting Started/3_visual.md deleted file mode 100644 index c907898..0000000 --- a/docs/en/Getting Started/3_visual.md +++ /dev/null @@ -1,3 +0,0 @@ -# Visualization and analysis - -Navigate to the Conductor documentation. \ No newline at end of file diff --git a/docs/en/Guides/outfit_with_loop.md b/docs/en/Guides/outfit_with_loop.md new file mode 100644 index 0000000..c483138 --- /dev/null +++ b/docs/en/Guides/outfit_with_loop.md @@ -0,0 +1,112 @@ +# Outfit Recommendation with Loop Example + +This example demonstrates how to use the framework for outfit recommendation tasks with loop functionality. The example code can be found in the `examples/step3_outfit_with_loop` directory. +```bash + cd examples/step3_outfit_with_loop +``` + +## Overview + +This example implements an interactive outfit recommendation workflow that uses a loop-based approach to refine recommendations based on user feedback. The workflow consists of the following key components: + +1. **Initial Image Input** + - OutfitImageInput: Handles the upload and processing of the initial clothing item image + - Serves as the starting point for the recommendation process + +2. **Interactive QA Loop with Weather Integration** + - OutfitQA: Conducts an interactive Q&A session to gather context and preferences + - Uses web search tool to fetch real-time weather data for the specified location + - OutfitDecider: Evaluates if sufficient information has been collected based on: + - User preferences + - Current weather conditions + - Uses DoWhileTask to continue the loop until adequate information is gathered + - Loop terminates when OutfitDecider returns decision=true + +3. **Final Recommendation** + - OutfitRecommendation: Generates the final outfit suggestions based on: + - The initial uploaded image + - Information collected during the Q&A loop + - Current weather conditions from web search + - Other context (occasion, preferences, etc.) + +4. **Workflow Flow** + ``` + Start -> Image Input -> OutfitQA Loop (QA + Weather Search + Decision) -> Final Recommendation -> End + ``` + +The workflow leverages Redis for state management and the Conductor server for workflow orchestration. This architecture enables: +- Image-based outfit recommendations +- Weather-aware outfit suggestions using real-time data +- Interactive refinement through structured Q&A +- Context-aware suggestions incorporating multiple factors +- Persistent state management across the workflow + + +## Prerequisites + +- Python 3.10+ +- Required packages installed (see requirements.txt) +- Access to OpenAI API or compatible endpoint +- Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml) +- Redis server running locally or remotely +- Conductor server running locally or remotely + +## Configuration + +The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration: + +1. Generate the container.yaml file: + ```bash + python compile_container.py + ``` + This will create a container.yaml file with default settings under `examples/step3_outfit_with_loop`. + +2. Configure your LLM settings in `configs/llms/gpt.yml` and `configs/llms/text_res.yml`: + - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file + ```bash + export custom_openai_key="your_openai_api_key" + export custom_openai_endpoint="your_openai_endpoint" + ``` + - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file + +3. Configure your Bing Search API key in `configs/tools/websearch.yml`: + - Set your Bing API key through environment variable or by directly modifying the yml file + ```bash + export bing_api_key="your_bing_api_key" + ``` + +4. Update settings in the generated `container.yaml`: + - Modify Redis connection settings: + - Set the host, port and credentials for your Redis instance + - Configure both `redis_stream_client` and `redis_stm_client` sections + - Update the Conductor server URL under conductor_config section + - Adjust any other component settings as needed + +## Running the Example + +1. Run the outfit recommendation workflow: + + For terminal/CLI usage: + ```bash + python run_cli.py + ``` + + For app/GUI usage: + ```bash + python run_app.py + ``` + + +## Troubleshooting + +If you encounter issues: +- Verify Redis is running and accessible +- Check your OpenAI API key and Bing API key are valid +- Ensure all dependencies are installed correctly +- Review logs for any error messages +- Confirm Conductor server is running and accessible +- Check Redis Stream client and Redis STM client configuration + +## Building the Example + +Coming soon! This section will provide detailed instructions for building the step3_outfit_with_loop example step by step. diff --git a/docs/en/Guides/outfit_with_ltm.md b/docs/en/Guides/outfit_with_ltm.md new file mode 100644 index 0000000..5f25dff --- /dev/null +++ b/docs/en/Guides/outfit_with_ltm.md @@ -0,0 +1,149 @@ +# Outfit Recommendation with Long-Term Memory Example + +This example demonstrates how to use the framework for outfit recommendation tasks with long-term memory functionality. The example code can be found in the `examples/step4_outfit_with_ltm` directory. +```bash + cd examples/step4_outfit_with_ltm +``` + + +## Overview + +This example implements an outfit recommendation system with long-term memory capabilities through two main workflows: + +1. **Image Storage Workflow** + - ImageIndexListener: Monitors and captures new clothing images + - OutfitImagePreprocessor: Processes and prepares images for storage + - Stores processed images in Milvus long-term memory (LTM) for future retrieval + - Workflow sequence: Image Listening -> Preprocessing -> LTM Storage + +2. **Outfit Recommendation Workflow** + - OutfitQA: Conducts interactive Q&A to understand user preferences + - OutfitDecider: Determines if sufficient information is collected + - Uses DoWhileTask for iterative refinement until decision is positive + - OutfitGeneration: Generates outfit recommendations using stored image data + - OutfitConclusion: Presents final recommendations with explanations + +The system leverages both short-term memory (Redis STM) and long-term memory (Milvus LTM) for: +- Efficient image storage and retrieval +- Persistent clothing item database +- Context-aware outfit recommendations +- Interactive preference refinement +- Stateful conversation management + +3. **Workflow Architecture** + ``` + Image Storage: Listen -> Preprocess -> Store in LTM + Recommendation: QA Loop (QA + Decision) -> Generation -> Conclusion + ``` + +The system uses Redis for state management, Milvus for long-term image storage, and Conductor for workflow orchestration. This architecture enables: +- Scalable image database management +- Intelligent outfit recommendations based on stored items +- Interactive preference gathering +- Persistent clothing knowledge base +- Efficient retrieval of relevant items + +## Prerequisites + +- Python 3.10+ +- Required packages installed (see requirements.txt) +- Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml) +- Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml) +- Redis server running locally or remotely +- Conductor server running locally or remotely +- Milvus vector database (will be started automatically when workflow runs) +- Sufficient storage space for image database +- Install Git LFS by `git lfs intall`, then pull sample images by `git lfs pull` + +## Configuration + +The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, Milvus connections and other service configurations. To set up your configuration: + +1. Generate the container.yaml files: + ```bash + # For image storage workflow + python image_storage/compile_container.py + + # For outfit recommendation workflow + python outfit_from_storage/compile_container.py + ``` + This will create two container.yaml files with default settings under `image_storage` and `outfit_from_storage` directories: + - `image_storage/container.yaml`: Configuration for the image storage workflow + - `outfit_from_storage/container.yaml`: Configuration for the outfit recommendation workflow + +2. Configure your LLM settings in `configs/llms/gpt.yml` and `configs/llms/text_res.yml` in the two workflow directories: + - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file + ```bash + export custom_openai_key="your_openai_api_key" + export custom_openai_endpoint="your_openai_endpoint" + ``` + - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file + +3. Configure your Bing Search API key in `configs/tools/websearch.yml` in the two workflow directories: + - Set your Bing API key through environment variable or by directly modifying the yml file + ```bash + export bing_api_key="your_bing_api_key" + ``` +4. Configure your text encoder settings in `configs/llms/text_encoder.yml` in the two workflow directories: + - Set your OpenAI text encoder endpoint and API key through environment variable or by directly modifying the yml file + ```bash + export custom_openai_text_encoder_key="openai_text_encoder_key" + export custom_openai_text_encoder_endpoint="your_openai_endpoint" + ``` + - The default text encoder configuration uses OpenAI text embedding v3 with 3072 dimensions, make sure you change the dim value of `MilvusLTM` in `container.yaml` + - Adjust the embedding dimension and other settings as needed through environment variable or by directly modifying the yml file + +4. Update settings in the generated `container.yaml`: + - Modify Redis connection settings: + - Set the host, port and credentials for your Redis instance + - Configure both `redis_stream_client` and `redis_stm_client` sections + - Update the Conductor server URL under conductor_config section + - Configure MilvusLTM in `components` section: + - Set the `storage_name` and `dim` for MilvusLTM + - Adjust other settings as needed + - Adjust any other component settings as needed + +## Running the Example + +1. Run the image storage workflow first: + + For terminal/CLI usage: + ```bash + python image_storage/run_image_storage_cli.py + ``` + For app usage: + ```bash + python image_storage/run_image_storage_app.py + ``` + + This workflow will store outfit images in the Milvus database. + +2. Run the outfit recommendation workflow in a separate terminal: + + For terminal/CLI usage: + ```bash + python outfit_from_storage/run_outfit_recommendation_cli.py + ``` + + For app/GUI usage: + ```bash + python outfit_from_storage/run_outfit_recommendation_app.py + ``` + + This workflow will retrieve outfit recommendations from the stored images. + + +## Troubleshooting + +If you encounter issues: +- Verify Redis is running and accessible +- Check your OpenAI API key and Bing API key are valid +- Ensure all dependencies are installed correctly +- Review logs for any error messages +- Confirm Conductor server is running and accessible +- Check Redis Stream client and Redis STM client configuration + +## Building the Example + +Coming soon! This section will provide detailed instructions for building the step4_outfit_with_ltm example step by step. + diff --git a/docs/en/Guides/outfit_with_switch.md b/docs/en/Guides/outfit_with_switch.md new file mode 100644 index 0000000..3eb3762 --- /dev/null +++ b/docs/en/Guides/outfit_with_switch.md @@ -0,0 +1,112 @@ +# Outfit Recommendation with Switch Example + +This example demonstrates how to use the framework for outfit recommendation tasks with switch_case functionality. The example code can be found in the `examples/step2_outfit_with_switch` directory. + +```bash + cd examples/step2_outfit_with_switch +``` + +## Overview + +This example implements an outfit recommendation workflow that uses switch-case functionality to conditionally include weather information in the recommendation process. The workflow consists of the following key components: + +1. **Input Interface** + - Handles user input containing clothing requests and image data + - Processes and caches any uploaded images + - Extracts the user's outfit request instructions + +2. **Weather Decision Logic** + - WeatherDecider: Analyzes the user's request to determine if weather information is needed + - Makes a binary decision (0 or 1) based on context in the user's request + - Controls whether weather data should be fetched + +3. **Conditional Weather Search** + - WeatherSearcher: Only executes if WeatherDecider returns 0 (weather info needed) + - Uses web search functionality to fetch current weather conditions + - Integrates weather data into the recommendation context + +4. **Outfit Recommendation** + - Generates final clothing suggestions based on: + - User's original request + - Weather information (if available) + - Any provided image context + - Provides complete outfit recommendations + +The workflow follows this sequence: + + +## Prerequisites + +- Python 3.10+ +- Required packages installed (see requirements.txt) +- Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml) +- Access to Bing API key for web search functionality to search real-time weather information for outfit recommendations (see configs/tools/websearch.yml) +- Redis server running locally or remotely +- Conductor server running locally or remotely + +## Configuration + +The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration: + +1. Generate the container.yaml file: + ```bash + python compile_container.py + ``` + This will create a container.yaml file with default settings under `examples/step2_outfit_with_switch`. + + + +2. Configure your LLM settings in `configs/llms/gpt.yml` and `configs/llms/text_res.yml`: + + - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file + ```bash + export custom_openai_key="your_openai_api_key" + export custom_openai_endpoint="your_openai_endpoint" + ``` + - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file + +3. Configure your Bing Search API key in `configs/tools/websearch.yml`: + - Set your Bing API key through environment variable or by directly modifying the yml file + ```bash + export bing_api_key="your_bing_api_key" + ``` + +4. Update settings in the generated `container.yaml`: + - Modify Redis connection settings: + - Set the host, port and credentials for your Redis instance + - Configure both `redis_stream_client` and `redis_stm_client` sections + - Update the Conductor server URL under conductor_config section + - Adjust any other component settings as needed + +## Running the Example + +3. Run the outfit recommendation with switch example: + + For terminal/CLI usage: + ```bash + python run_cli.py + ``` + + For app/GUI usage: + ```bash + python run_app.py + ``` + + + +## Troubleshooting + +If you encounter issues: + +- Verify Conductor and Redis are running and accessible +- Check your OpenAI API key and Bing API key are valid +- Check Redis Stream client and Redis STM client configuration + +- Ensure all dependencies are installed correctly +- Review logs for any error messages + + +## Building the Example + +Coming soon! This section will provide detailed instructions for building the step2_outfit_with_switch example step by step. + diff --git a/docs/en/Guides/simple_vqa.md b/docs/en/Guides/simple_vqa.md new file mode 100644 index 0000000..93ca107 --- /dev/null +++ b/docs/en/Guides/simple_vqa.md @@ -0,0 +1,85 @@ +# Simple Visual Question Answering Example + +This example demonstrates how to use the framework for visual question answering (VQA) tasks. The example code can be found in the `examples/step1_simpleVQA` directory. + +```bash + cd examples/step1_simpleVQA +``` + +## Overview + +This example implements a simple Visual Question Answering (VQA) workflow that consists of two main components: + +1. **Input Interface** + - Handles user input containing questions about images + - Processes and manages image data + - Extracts the user's questions/instructions + +2. **Simple VQA Processing** + - Takes the user input and image + - Analyzes the image based on the user's question + - Generates appropriate responses to visual queries + +The workflow follows a straightforward sequence: + +## Prerequisites + +- Python 3.10+ +- Required packages installed (see requirements.txt) +- Access to OpenAI API or compatible endpoint (see configs/llms/gpt.yml) +- Redis server running locally or remotely +- Conductor server running locally or remotely + +## Configuration + +The container.yaml file is a configuration file that manages dependencies and settings for different components of the system, including Conductor connections, Redis connections, and other service configurations. To set up your configuration: + +1. Generate the container.yaml file: + ```bash + python compile_container.py + ``` + This will create a container.yaml file with default settings under `examples/step1_simpleVQA`. + + +2. Configure your LLM settings in `configs/llms/gpt.yml`: + - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file + ```bash + export custom_openai_key="your_openai_api_key" + export custom_openai_endpoint="your_openai_endpoint" + ``` + - Configure other model settings like temperature as needed through environment variable or by directly modifying the yml file + +3. Update settings in the generated `container.yaml`: + - Modify Redis connection settings: + - Set the host, port and credentials for your Redis instance + - Configure both `redis_stream_client` and `redis_stm_client` sections + - Update the Conductor server URL under conductor_config section + - Adjust any other component settings as needed + +## Running the Example + +3. Run the simple VQA example: + + For terminal/CLI usage: + ```bash + python run_cli.py + ``` + + For app/GUI usage: + ```bash + python run_app.py + ``` + +## Troubleshooting + +If you encounter issues: +- Verify Redis is running and accessible +- Check your OpenAI API key is valid +- Ensure all dependencies are installed correctly +- Review logs for any error messages + + +## Building the Example + +Coming soon! This section will provide detailed instructions for building and packaging the step1_simpleVQA example step by step. + diff --git a/docs/en/assets/images/OmAgent.png b/docs/en/assets/images/OmAgent.png new file mode 100644 index 0000000..cd32624 Binary files /dev/null and b/docs/en/assets/images/OmAgent.png differ diff --git a/docs/en/assets/images/app_album.png b/docs/en/assets/images/app_album.png new file mode 100644 index 0000000..0ee2cae Binary files /dev/null and b/docs/en/assets/images/app_album.png differ diff --git a/docs/en/assets/images/app_album_img.png b/docs/en/assets/images/app_album_img.png new file mode 100644 index 0000000..5777d25 Binary files /dev/null and b/docs/en/assets/images/app_album_img.png differ diff --git a/docs/en/assets/images/app_allow.png b/docs/en/assets/images/app_allow.png new file mode 100644 index 0000000..22d2828 Binary files /dev/null and b/docs/en/assets/images/app_allow.png differ diff --git a/docs/en/assets/images/app_back_button.png b/docs/en/assets/images/app_back_button.png new file mode 100644 index 0000000..1b7f1df Binary files /dev/null and b/docs/en/assets/images/app_back_button.png differ diff --git a/docs/en/assets/images/app_boot_page.png b/docs/en/assets/images/app_boot_page.png new file mode 100644 index 0000000..e11b9aa Binary files /dev/null and b/docs/en/assets/images/app_boot_page.png differ diff --git a/docs/en/assets/images/app_brush_button1.png b/docs/en/assets/images/app_brush_button1.png new file mode 100644 index 0000000..7611c58 Binary files /dev/null and b/docs/en/assets/images/app_brush_button1.png differ diff --git a/docs/en/assets/images/app_brush_button2.png b/docs/en/assets/images/app_brush_button2.png new file mode 100644 index 0000000..031971c Binary files /dev/null and b/docs/en/assets/images/app_brush_button2.png differ diff --git a/docs/en/assets/images/app_brush_button3.png b/docs/en/assets/images/app_brush_button3.png new file mode 100644 index 0000000..91e55b9 Binary files /dev/null and b/docs/en/assets/images/app_brush_button3.png differ diff --git a/docs/en/assets/images/app_brush_button4.png b/docs/en/assets/images/app_brush_button4.png new file mode 100644 index 0000000..d99c527 Binary files /dev/null and b/docs/en/assets/images/app_brush_button4.png differ diff --git a/docs/en/assets/images/app_camera_button.png b/docs/en/assets/images/app_camera_button.png new file mode 100644 index 0000000..725202d Binary files /dev/null and b/docs/en/assets/images/app_camera_button.png differ diff --git a/docs/en/assets/images/app_cancel_button.png b/docs/en/assets/images/app_cancel_button.png new file mode 100644 index 0000000..98ff477 Binary files /dev/null and b/docs/en/assets/images/app_cancel_button.png differ diff --git a/docs/en/assets/images/app_chat_history.png b/docs/en/assets/images/app_chat_history.png new file mode 100644 index 0000000..f656ec5 Binary files /dev/null and b/docs/en/assets/images/app_chat_history.png differ diff --git a/docs/en/assets/images/app_chat_history2.png b/docs/en/assets/images/app_chat_history2.png new file mode 100644 index 0000000..1d30d01 Binary files /dev/null and b/docs/en/assets/images/app_chat_history2.png differ diff --git a/docs/en/assets/images/app_chat_history3.png b/docs/en/assets/images/app_chat_history3.png new file mode 100644 index 0000000..329888e Binary files /dev/null and b/docs/en/assets/images/app_chat_history3.png differ diff --git a/docs/en/assets/images/app_check_button.png b/docs/en/assets/images/app_check_button.png new file mode 100644 index 0000000..337bfcb Binary files /dev/null and b/docs/en/assets/images/app_check_button.png differ diff --git a/docs/en/assets/images/app_close_button.png b/docs/en/assets/images/app_close_button.png new file mode 100644 index 0000000..24dfe3b Binary files /dev/null and b/docs/en/assets/images/app_close_button.png differ diff --git a/docs/en/assets/images/app_collapse_button.png b/docs/en/assets/images/app_collapse_button.png new file mode 100644 index 0000000..4e53758 Binary files /dev/null and b/docs/en/assets/images/app_collapse_button.png differ diff --git a/docs/en/assets/images/app_connection_settings.png b/docs/en/assets/images/app_connection_settings.png new file mode 100644 index 0000000..3b213ec Binary files /dev/null and b/docs/en/assets/images/app_connection_settings.png differ diff --git a/docs/en/assets/images/app_conversation.png b/docs/en/assets/images/app_conversation.png new file mode 100644 index 0000000..3d8363d Binary files /dev/null and b/docs/en/assets/images/app_conversation.png differ diff --git a/docs/en/assets/images/app_expand_button.png b/docs/en/assets/images/app_expand_button.png new file mode 100644 index 0000000..99a39ad Binary files /dev/null and b/docs/en/assets/images/app_expand_button.png differ diff --git a/docs/en/assets/images/app_home.png b/docs/en/assets/images/app_home.png new file mode 100644 index 0000000..429428b Binary files /dev/null and b/docs/en/assets/images/app_home.png differ diff --git a/docs/en/assets/images/app_home_button.png b/docs/en/assets/images/app_home_button.png new file mode 100644 index 0000000..017774b Binary files /dev/null and b/docs/en/assets/images/app_home_button.png differ diff --git a/docs/en/assets/images/app_mobile.png b/docs/en/assets/images/app_mobile.png new file mode 100644 index 0000000..91569c7 Binary files /dev/null and b/docs/en/assets/images/app_mobile.png differ diff --git a/docs/en/assets/images/app_multi-turn_setting.png b/docs/en/assets/images/app_multi-turn_setting.png new file mode 100644 index 0000000..dc06a8c Binary files /dev/null and b/docs/en/assets/images/app_multi-turn_setting.png differ diff --git a/docs/en/assets/images/app_parameter_setting.png b/docs/en/assets/images/app_parameter_setting.png new file mode 100644 index 0000000..015477a Binary files /dev/null and b/docs/en/assets/images/app_parameter_setting.png differ diff --git a/docs/en/assets/images/app_parameter_setting2.png b/docs/en/assets/images/app_parameter_setting2.png new file mode 100644 index 0000000..08f2193 Binary files /dev/null and b/docs/en/assets/images/app_parameter_setting2.png differ diff --git a/docs/en/assets/images/app_play_voice.png b/docs/en/assets/images/app_play_voice.png new file mode 100644 index 0000000..481b781 Binary files /dev/null and b/docs/en/assets/images/app_play_voice.png differ diff --git a/docs/en/assets/images/app_qrcode.png b/docs/en/assets/images/app_qrcode.png new file mode 100644 index 0000000..486ebac Binary files /dev/null and b/docs/en/assets/images/app_qrcode.png differ diff --git a/docs/en/assets/images/app_refresh_button.png b/docs/en/assets/images/app_refresh_button.png new file mode 100644 index 0000000..d2ab93c Binary files /dev/null and b/docs/en/assets/images/app_refresh_button.png differ diff --git a/docs/en/assets/images/app_setting.png b/docs/en/assets/images/app_setting.png new file mode 100644 index 0000000..3e5b338 Binary files /dev/null and b/docs/en/assets/images/app_setting.png differ diff --git a/docs/en/assets/images/app_setting_button.png b/docs/en/assets/images/app_setting_button.png new file mode 100644 index 0000000..3576e4f Binary files /dev/null and b/docs/en/assets/images/app_setting_button.png differ diff --git a/docs/en/assets/images/app_setting_window.png b/docs/en/assets/images/app_setting_window.png new file mode 100644 index 0000000..91569c7 Binary files /dev/null and b/docs/en/assets/images/app_setting_window.png differ diff --git a/docs/en/assets/images/app_stop_generating.png b/docs/en/assets/images/app_stop_generating.png new file mode 100644 index 0000000..d71495a Binary files /dev/null and b/docs/en/assets/images/app_stop_generating.png differ diff --git a/docs/en/assets/images/app_stop_voice.png b/docs/en/assets/images/app_stop_voice.png new file mode 100644 index 0000000..ade43eb Binary files /dev/null and b/docs/en/assets/images/app_stop_voice.png differ diff --git a/docs/en/assets/images/app_time_button.png b/docs/en/assets/images/app_time_button.png new file mode 100644 index 0000000..cb9b344 Binary files /dev/null and b/docs/en/assets/images/app_time_button.png differ diff --git a/docs/en/assets/images/app_time_button2.png b/docs/en/assets/images/app_time_button2.png new file mode 100644 index 0000000..4849a89 Binary files /dev/null and b/docs/en/assets/images/app_time_button2.png differ diff --git a/docs/en/assets/images/app_voice_button.png b/docs/en/assets/images/app_voice_button.png new file mode 100644 index 0000000..3b25d7d Binary files /dev/null and b/docs/en/assets/images/app_voice_button.png differ diff --git a/docs/en/assets/images/app_voice_dailog2.png b/docs/en/assets/images/app_voice_dailog2.png new file mode 100644 index 0000000..6852dfb Binary files /dev/null and b/docs/en/assets/images/app_voice_dailog2.png differ diff --git a/docs/en/assets/images/app_voice_dialog.png b/docs/en/assets/images/app_voice_dialog.png new file mode 100644 index 0000000..3d8363d Binary files /dev/null and b/docs/en/assets/images/app_voice_dialog.png differ diff --git a/docs/en/assets/images/app_workflow_button.png b/docs/en/assets/images/app_workflow_button.png new file mode 100644 index 0000000..603d6fa Binary files /dev/null and b/docs/en/assets/images/app_workflow_button.png differ diff --git a/docs/en/assets/images/app_workflow_button2.png b/docs/en/assets/images/app_workflow_button2.png new file mode 100644 index 0000000..5a25e7a Binary files /dev/null and b/docs/en/assets/images/app_workflow_button2.png differ diff --git a/docs/en/assets/images/app_workflow_list.png b/docs/en/assets/images/app_workflow_list.png new file mode 100644 index 0000000..c31b3fc Binary files /dev/null and b/docs/en/assets/images/app_workflow_list.png differ diff --git a/docs/en/assets/images/app_workflow_name.png b/docs/en/assets/images/app_workflow_name.png new file mode 100644 index 0000000..5ba691b Binary files /dev/null and b/docs/en/assets/images/app_workflow_name.png differ diff --git a/docs/en/assets/images/app_workflow_progress.png b/docs/en/assets/images/app_workflow_progress.png new file mode 100644 index 0000000..a133493 Binary files /dev/null and b/docs/en/assets/images/app_workflow_progress.png differ diff --git a/docs/en/assets/images/app_workflow_setting.png b/docs/en/assets/images/app_workflow_setting.png new file mode 100644 index 0000000..adad441 Binary files /dev/null and b/docs/en/assets/images/app_workflow_setting.png differ diff --git a/docs/en/assets/images/architecture.jpg b/docs/en/assets/images/architecture.jpg new file mode 100644 index 0000000..4f9f9a0 Binary files /dev/null and b/docs/en/assets/images/architecture.jpg differ diff --git a/docs/en/assets/images/fork_task.png b/docs/en/assets/images/fork_task.png new file mode 100644 index 0000000..89ca1f2 Binary files /dev/null and b/docs/en/assets/images/fork_task.png differ diff --git a/docs/en/assets/images/icon.png b/docs/en/assets/images/icon.png new file mode 100644 index 0000000..32340a4 Binary files /dev/null and b/docs/en/assets/images/icon.png differ diff --git a/docs/en/assets/images/intro.png b/docs/en/assets/images/intro.png new file mode 100644 index 0000000..05a5e0a Binary files /dev/null and b/docs/en/assets/images/intro.png differ diff --git a/docs/en/assets/images/logo.png b/docs/en/assets/images/logo.png new file mode 100644 index 0000000..2aebf08 Binary files /dev/null and b/docs/en/assets/images/logo.png differ diff --git a/docs/en/assets/images/switch_task.png b/docs/en/assets/images/switch_task.png new file mode 100644 index 0000000..98ca56f Binary files /dev/null and b/docs/en/assets/images/switch_task.png differ diff --git a/docs/en/core_concepts/Clients/app.md b/docs/en/core_concepts/Clients/app.md new file mode 100644 index 0000000..74b0394 --- /dev/null +++ b/docs/en/core_concepts/Clients/app.md @@ -0,0 +1,182 @@ +# App + +OmAgent App is an app for developers to visualize and edit business scenario content for large model capabilities. It supports interaction with multimodal large models by obtaining data from mobile phone cameras, audio streams, etc., and combining memory, tool invocation, and other capabilities. Based on the business scenarios developed by users, it outputs corresponding content through interaction with the Agent service, providing a demonstration app for intelligent agent scenario development with multimodal content input and output. Next, we will introduce the relevant functions of OmAgent App step by step to start the user's experience journey. + +## App Installation +The QR code for downloading the app is as follows: + +
+ +
+ +- Currently, only Android phones are supported for download and use, but iOS support is coming soon. + +## App Usage + +### 1. APP Home Page +After opening the APP, the guide page will be displayed as shown in the figure below: + ++ +
+ +The APP home page includes Mobile, Glasses, and Connection Settings as shown below: ++ +
+ +#### 1.1 Connection Settings +The app automatically searches for and connects to the environment IP running on the local network. If the connection is successful, a toast message will display: "Connection Successful". If the connection fails, a toast message will display: "Service connection failure". +- Click the "Connection Settings" button at the bottom of the homepage to enter the app configuration page, as shown in the figure below; ++ +
+ +- Enter the correct IP in the IP input box and click the "Connection" button at the bottom. After a successful connection, return to the homepage and click the "Mobile" section to enter the Mobile page. +- Note: The IP input box displays the last successfully connected IP address by default. + +### 2. Mobile +"Mobile" mainly includes settings, voice input, camera, multimodal, brush functions, etc. + +#### 2.1 Return to Home Page +Click the top left return to home button ![](../../assets/images/app_home_button.png) to return to the home page. + +#### 2.2 Settings +Click the top right settings button ![](../../assets/images/app_setting_button.png), the page will pop up a settings window, as shown below. Click the ![](../../assets/images/app_close_button.png) button or outside the window to close the window. + ++ +
+ +##### 2.2.1 Album +Click Album to enter the gallery page, loading 80 images at a time, as shown below: + ++ +
+ +- Click Reindex, if indexing is successful, the page will prompt success, indicating that all images in the gallery have been successfully indexed. If it fails, it will prompt: failure. After selecting images, click Reindex to index only the selected images. + +- Click Select to trigger batch selection, all images on the page will display a checkmark icon at the bottom right; the page selection button changes to deselect, the Upload button changes to Delete, allowing images to be selected for deletion; click Cancel to return to the unselected state. + +- Click Upload to call the album, select images to upload, the gallery dynamically displays, uploads are displayed immediately, up to 20 images can be selected at a time. During the upload process, click Cancel to stop uploading images (the page only displays successfully uploaded images). + +- Click on an image to enlarge and preview it, as shown below; supports left and right sliding and deletion. The top displays the image page number, which can be clicked to return to the gallery. + ++ +
+ +- Click the top left return button on the gallery page ![](../../assets/images/app_back_button.png), the return button returns to the Mobile page, and the settings window is still displayed. + +##### 2.2.2 Chat History +Click Chat history to enter the history dialogue page, displaying all dialogue content, as shown below; click Delete all to confirm the operation and clear the history dialogue. + ++ +
+ +History dialogue display rules: +- Different Workflow dialogues are displayed separated by time, supporting up and down sliding. + +- A single Workflow content dialogue includes text and images, with multiple image scenarios. Click on an image to preview it, and multiple image previews support sliding. + +- The progress menu within a single Workflow supports clicking to expand, as shown below: + ++ +
+ +- Share: Click the share button under the Workflow dialogue, the share function is displayed at the bottom of the page, as shown below: sharing can be done as needed. + ++ +
+ +##### 2.2.3 Multi-turn Dialogue +Click Multi-turn dialogue to display the multi-turn dialogue dropdown menu, default is 1 turn, up to 10 turns can be set, as shown below: + ++ +
+ +##### 2.2.4 Workflow Settings +Click Workflow Settings to enter the Workflow list selection page, as shown below: + ++ +
+ +In the list, you can click to select the required Workflow, only single selection is supported. After selection, a checkmark ![](../../assets/images/app_check_button.png) icon is displayed. Click the refresh button ![](../../assets/images/app_refresh_button.png) to refresh the list. + +##### 2.2.5 Parameter Settings +Click Parameter Settings to enter the custom parameter settings page, as shown below: + ++ +
+ +Click +Add parameter, the page displays Parameter name\value input boxes, as shown below; the input boxes support deletion, and the input box can add up to 20. After adding 20, there is no +Add parameter button. Click SAVE to save successfully (the input box content can be empty to save successfully). + ++ +
+ +#### 2.3 Global Voice +![](../../assets/images/app_voice_button.png) Default is on, click to turn off, click to switch to on prompt: Voice auto-play is on, switch to off prompt: Voice auto-play is off. + +#### 2.4 Camera +![](../../assets/images/app_camera_button.png) Default is rear, click to switch to front, the page previews the camera screen in real-time, page gestures: ① click to focus ② pinch to zoom, after zooming, a 1X button is displayed, click the 1X button to return to 1X zoom, the 1X button disappears. + +#### 2.5 Voice Dialogue +Long press the voice button to trigger the voice recognition function, long press to speak and then release, the page displays the dialogue section, as shown below: + ++ +
+ +Note: When the model reply content has images, click the image to enlarge and preview. + +##### 2.5.1 Expand +![](../../assets/images/app_expand_button.png) Click the expand button to display the entire dialogue section, as shown below: + ++ +
+ +Click the collapse button at the top of the image ![](../../assets/images/app_collapse_button.png) to collapse the dialogue. + +##### 2.5.2 Workflow +![](../../assets/images/app_workflow_name.png) is the Workflow name of the current dialogue. Click the button on the right ![](../../assets/images/app_workflow_button.png) to display the Workflow progress, as shown below: + ++ +
+ +The progress menu supports expanding and collapsing. + +##### 2.5.3 Regenerate +![](../../assets/images/app_workflow_button2.png) The regenerate button is displayed under the nearest model reply content. Click to regenerate and overwrite the previous reply content. + +##### 2.5.4 Voice Broadcast +![](../../assets/images/app_play_voice.png) The small voice button is displayed under the nearest model reply content. When voice is playing, it is displayed as ![](../../assets/images/app_stop_voice.png). After the nearest model reply ends or is manually clicked, it is displayed as ![](../../assets/images/app_play_voice.png). After the nearest model reply ends, click this button to replay the nearest model reply content; +- When the global voice button is on, voice broadcast is on by default. After turning off the global voice, there is no voice broadcast. At this time, click the small voice button to replay the nearest model reply content. + +##### 2.5.5 Stop Generating +![](../../assets/images/app_stop_generating.png) Stop generating, in the content output state, you can manually click to stop generating and interrupt the model reply. + +#### 2.6 Voice - Timing Mode +Long press the voice button to speak, do not release, slide up to timing ![](../../assets/images/app_time_button.png) and then release, the page enters the timing mode frame extraction 5-second countdown ![](../../assets/images/app_time_button2.png), timing frame extraction mode, voice input cannot be clicked, the button is grayed out; after the countdown ends, the dialogue module is displayed to show the dialogue content; (during the 5-second countdown, click the countdown to end the timing frame extraction early). + +#### 2.7 Cancel Sending +Long press the voice button to speak, do not release, slide up to ![](../../assets/images/app_cancel_button.png) and then release, cancel sending. + +#### 2.8 Brush +![](../../assets/images/app_brush_button1.png) Default is grayed out, click to enable the brush function, the page displays brush color and eraser buttons as shown below: + ++ +
+ +Supports selecting different colors for brush annotation on the real-time camera preview page; click the eraser ![](../../assets/images/app_brush_button3.png) to remove all brush annotations; click the brush button ![](../../assets/images/app_brush_button4.png) to turn off the brush function (brush marks are displayed on the dialogue images using the brush in the history dialogue page). + diff --git a/docs/en/Core Concepts/Clients/build.md b/docs/en/core_concepts/Clients/build.md similarity index 100% rename from docs/en/Core Concepts/Clients/build.md rename to docs/en/core_concepts/Clients/build.md diff --git a/docs/en/core_concepts/Clients/client.md b/docs/en/core_concepts/Clients/client.md new file mode 100644 index 0000000..5607773 --- /dev/null +++ b/docs/en/core_concepts/Clients/client.md @@ -0,0 +1,53 @@ +# Client + +Currently, there are three clients: `DefaultClient`, `AppClient`, and `WebpageClient`. + +`DefaultClient` is the default client used for interacting with users via the command line. +- The parameters of `DefaultClient` include `interactor`, `processor`, `config_path`, `workers`, and `input_prompt`. +- Among them, either `interactor` or `processor` must be chosen to be passed in. `interactor` is the workflow used for interaction, and `processor` is the workflow used for image processing. +- At least one of `config_path` and `workers` must be passed in, or both can be passed. `config_path` is the path to the worker configuration file, and `workers` is a list of `Worker` instances. +- `input_prompt` is the prompt message for user input, which defaults to None. If you need to provide a prompt message after startup, you need to pass it in. Alternatively, you can set `input_prompt` in the `self.input.read_input()` method of your first worker node. + +`AppClient` is used for interacting with users within an app. +- The parameters of `AppClient` include `interactor`, `processor`, `config_path`, and `workers`. +- Among them, either `interactor` or `processor` must be chosen to be passed in. `interactor` is the workflow used for interaction, and `processor` is the workflow used for image processing. +- At least one of `config_path` and `workers` must be passed in, or both can be passed. `config_path` is the path to the worker configuration file, and `workers` is a list of `Worker` instances. + +`WebpageClient` is a web page chat window implemented with gradio, which can be used for interaction. +- The parameters of `WebpageClient` include `interactor`, `processor`, `config_path`, and `workers`. +- Among them, either `interactor` or `processor` must be chosen to be passed in. +- `interactor` is the workflow used for interaction, with a default port of **7860** after startup, and the access address is `http://127.0.0.1:7860`. +- `processor` is the workflow used for image processing, with a default port of **7861** after startup, and the access address is `http://127.0.0.1:7861`. +- At least one of `config_path` and `workers` must be passed in, or both can be passed. `config_path` is the path to the worker configuration file, and `workers` is a list of `Worker` instances. + + +The input for `DefaultClient` uses `AppInput`, and the output uses `DefaultCallback`. The input for `AppClient` uses `AppInput`, and the output uses `AppCallback`. The input for `WebpageClient` uses `AppInput`, and the output uses `AppCallback`. + +When writing an agent worker, you don't need to worry about which one to use. Simply call `self.input.read_input()` and `self.callback.send_xxx()`. Depending on whether `DefaultClient` or `AppClient` or `WebpageClient` is instantiated, different input and output logic will be followed. + +The input has only one method: +- `read_input(workflow_instance_id: str, input_prompt = "")` + - `workflow_instance_id` is the ID of the workflow instance. + - `input_prompt` is the information prompting the user on what to input, which can be empty. + +The callback has five methods: +- `send_incomplete(agent_id, msg, took=0, msg_type=MessageType.TEXT.value, prompt_tokens=0, output_tokens=0, filter_special_symbols=True)` +- `send_block(agent_id, msg, took=0, msg_type=MessageType.TEXT.value, interaction_type=InteractionType.DEFAULT.value, prompt_tokens=0, output_tokens=0, filter_special_symbols=True)` +- `send_answer(agent_id, msg, took=0, msg_type=MessageType.TEXT.value, prompt_tokens=0, output_tokens=0, filter_special_symbols=True)` + + - `send_incomplete` (the conversation content is not yet complete), `send_block` (a single conversation has ended, but the overall result is not finished), `send_answer` (the overall return is complete). + - The required parameters for these three methods are `agent_id` and `msg`. `agent_id` is the ID of the workflow instance, and `msg` is the message content. + - `took`, `msg_type`, `interaction_type`, `prompt_tokens`, and `output_tokens` are optional parameters, chosen based on the actual situation. + - `took` is the time consumed by the program, in seconds. + - `msg_type` is the message type, with three options: `MessageType.TEXT.value`, `MessageType.IMAGE_URL.value`, `MessageType.IMAGE_BASE64.value`. The default is `MessageType.TEXT.value`. + - `interaction_type` is the interaction type, with two options: `InteractionType.DEFAULT.value`, `InteractionType.INPUT.value`. The default is `InteractionType.DEFAULT.value`, which means doing nothing. `InteractionType.INPUT.value` means that after this message is output, user input is required. + - `prompt_tokens` is the number of input tokens, and `output_tokens` is the number of output tokens. + - `filter_special_symbols` is a boolean parameter, in `AppClient` it defaults to `True`, and special symbols such as `*`, `#`, `-` will be filtered out from the message content when the message type is `MessageType.TEXT.value`. + - `send_incomplete` must be followed by a `send_block`. + - The last message must be `send_answer`. + +- `info(agent_id, progress, message)` + - The required parameters for the `info` method are `agent_id`, `progress`, and `message`. `agent_id` is the ID of the workflow instance, `progress` is the program name, and `message` is the progress information. + +- `error(agent_id, error_code, error_info, **kwargs)` + - The required parameters for the `error` method are `agent_id`, `error_code`, and `error_info`. `agent_id` is the ID of the workflow instance, `error_code` is the error code, and `error_info` is the error information. diff --git a/docs/en/Core Concepts/Clients/config.md b/docs/en/core_concepts/Clients/config.md similarity index 100% rename from docs/en/Core Concepts/Clients/config.md rename to docs/en/core_concepts/Clients/config.md diff --git a/docs/en/Core Concepts/Clients/intro.md b/docs/en/core_concepts/Clients/intro.md similarity index 100% rename from docs/en/Core Concepts/Clients/intro.md rename to docs/en/core_concepts/Clients/intro.md diff --git a/docs/en/Core Concepts/Memory/build.md b/docs/en/core_concepts/Memory/build.md similarity index 100% rename from docs/en/Core Concepts/Memory/build.md rename to docs/en/core_concepts/Memory/build.md diff --git a/docs/en/Core Concepts/Memory/config.md b/docs/en/core_concepts/Memory/config.md similarity index 100% rename from docs/en/Core Concepts/Memory/config.md rename to docs/en/core_concepts/Memory/config.md diff --git a/docs/en/core_concepts/Memory/intro.md b/docs/en/core_concepts/Memory/intro.md new file mode 100644 index 0000000..32ffc3e --- /dev/null +++ b/docs/en/core_concepts/Memory/intro.md @@ -0,0 +1,16 @@ +# Memory + +OmAgent implements two types of memory systems: + +1. **Short-Term Memory (STM)** + - Temporary storage for workflow-specific data + - Implemented using Redis by default + - Useful for storing session/workflow state + - Data is volatile and workflow-instance specific + +2. **Long-Term Memory (LTM)** + - Persistent storage for long-term data + - Implemented using vector database + - Supports vector storage and similarity search + - Data persists across different workflow instances + diff --git a/docs/en/Core Concepts/Models/build.md b/docs/en/core_concepts/Models/build.md similarity index 100% rename from docs/en/Core Concepts/Models/build.md rename to docs/en/core_concepts/Models/build.md diff --git a/docs/en/Core Concepts/Models/config.md b/docs/en/core_concepts/Models/config.md similarity index 100% rename from docs/en/Core Concepts/Models/config.md rename to docs/en/core_concepts/Models/config.md diff --git a/docs/en/Core Concepts/Models/intro.md b/docs/en/core_concepts/Models/intro.md similarity index 100% rename from docs/en/Core Concepts/Models/intro.md rename to docs/en/core_concepts/Models/intro.md diff --git a/docs/en/core_concepts/Models/llms.md b/docs/en/core_concepts/Models/llms.md new file mode 100644 index 0000000..e4c27fe --- /dev/null +++ b/docs/en/core_concepts/Models/llms.md @@ -0,0 +1,58 @@ +# LLMs +LLMs are the core components of Omagent. They are responsible for generating text via Large Language Models. + +It is constructed by following parts: +- ```BaseLLM```: The base class for all LLMs, it defines the basic properties and methods for all LLMs. +- ```BaseLLMBackend```: The enhanced class for better using LLMs, you can assemble specific LLMs with different prompt templates and output parsers. +- ```BasePromptTemplate```: The base class for all prompt templates, it defines the input variables and output parser for a prompt template. +- ```BaseOutputParser```: The base class for all output parsers, it defines how to parse the output of an LLM result. + +## Prompt Template +This is a simple way to define a prompt template. +```python +from omagent_core.models.llms.prompt.prompt import PromptTemplate + +# Define a system prompt template +system_prompt = PromptTemplate.from_template("You are a helpful assistant.", role="system") +# Define a user prompt template +user_prompt = PromptTemplate.from_template("Tell me a joke about {{topic}}", role="user") +``` +`topic` is a variable in the user prompt template, it will be replaced by the actual input value. + +## Output Parser +This is a simple way to define a output parser. +```python +from omagent_core.models.llms.prompt.parser import StrParser + +output_parser = StrParser() +``` +`StrParser` is a simple output parser that returns the output as a string. + +## Get LLM Result +This is a simple way to define a LLM request and get the result of an LLM. +1. The worker class should inherit from `BaseWorker` and `BaseLLMBackend`, and define the LLM model in the `prompts` and `llm` field. `OutputParser` is optional, if not defined, the default `StrParser` will be used. +2. Override the `_run` method to define the workflow logic. +```python +def _run(self, *args, **kwargs): + payload = { + "topic": "weather" + } + # 1. use the `infer` method to get the LLM result + chat_complete_res = self.infer(input_list=[payload])[0]["choices"][0]["message"].get("content") + # 2. use the `simple_infer` method to get the LLM result, it's a shortcut for the `infer` method + simple_infer_res = self.simple_infer(topic="weather")["choices"][0]["message"].get("content") + content = chat_complete_res[0]["choices"][0]["message"].get("content") + print(content) + return {'output': content} +``` + +For Multi-Modal LLMs, it's also simple and intuitive. +```python +def _run(self, *args, **kwargs): + payload = { + "topic": ["this image", PIL.Image.Image object, ...] + } + chat_complete_res = self.infer(input_list=[payload])[0]["choices"][0]["message"].get("content") + return {'output': chat_complete_res} +``` +The order of prompts given to the LLM is consistent with the order of elements in the list of variables, resulting in an alternating pattern of text and images. \ No newline at end of file diff --git a/docs/en/Core Concepts/Node/build.md b/docs/en/core_concepts/Node/build.md similarity index 100% rename from docs/en/Core Concepts/Node/build.md rename to docs/en/core_concepts/Node/build.md diff --git a/docs/en/Core Concepts/Node/config.md b/docs/en/core_concepts/Node/config.md similarity index 100% rename from docs/en/Core Concepts/Node/config.md rename to docs/en/core_concepts/Node/config.md diff --git a/docs/en/Core Concepts/Node/intro.md b/docs/en/core_concepts/Node/intro.md similarity index 100% rename from docs/en/Core Concepts/Node/intro.md rename to docs/en/core_concepts/Node/intro.md diff --git a/docs/en/Core Concepts/Tool/build.md b/docs/en/core_concepts/Tool/build.md similarity index 100% rename from docs/en/Core Concepts/Tool/build.md rename to docs/en/core_concepts/Tool/build.md diff --git a/docs/en/Core Concepts/Tool/config.md b/docs/en/core_concepts/Tool/config.md similarity index 100% rename from docs/en/Core Concepts/Tool/config.md rename to docs/en/core_concepts/Tool/config.md diff --git a/docs/en/core_concepts/Tool/intro.md b/docs/en/core_concepts/Tool/intro.md new file mode 100644 index 0000000..78ad7fe --- /dev/null +++ b/docs/en/core_concepts/Tool/intro.md @@ -0,0 +1,94 @@ +# Tool + +OmAgent's tool system is a robust and flexible framework that allows developers to create, configure, register, and invoke various tools seamlessly. Tools in OmAgent are modular components that perform specific tasks, enabling the intelligent agent to interact with different services and perform complex operations. + +## Building a Tool + +### Key Components of a Tool +To create a new tool in OmAgent, you need to define a class that inherits from `BaseTool` or one of its subclasses. This class should implement the necessary methods to perform its intended functionality. + +- **Description**: A string that describes what the tool does. +- **Arguments Schema (`ArgSchema`)**: Defines the input parameters required by the tool. +- **Execution Methods**: + - `_run`: Synchronous execution. + - `_arun`: Asynchronous execution. + +### Input Parameters Schema + +Configuration involves defining the input parameters that the tool requires and any additional settings it might need. This is typically defined with your tool class in json format. +There are four attributes in each argument in the `ArgSchema`: +- `description`: A string that describes what the tool does. +- `type`: The type of the argument. Support `string`, `integer`, `number`, `boolean`. +- `enum`: A list of allowed values for the argument. +- `required`: A boolean that indicates whether the argument is required + +Here is an example of the `ArgSchema` for a tool that performs web search: +```python +ARGSCHEMA = { + "search_query": {"type": "string", "description": "The search query."}, + "goals_to_browse": { + "type": "string", + "description": "What's you want to find on the website returned by search. If you need more details, request it in here. Examples: 'What is latest news about deepmind?', 'What is the main idea of this article?'", + }, + "region": { + "type": "string", + "description": "The region code of the search, default to `en-US`. Available regions: `en-US`, `zh-CN`, `ja-JP`, `de-DE`, `fr-FR`, `en-GB`.", + "required": True, + }, + "num_results": { + "type": "integer", + "description": "The page number of results to return, default is 1, maximum is 3.", + "required": True, + }, +} +``` + +### Registering a Tool +Use the `registry.register_tool()` decorator to register your tool so that it can be instantiated when building a worker. See [registry](../registry.md) for more details about the registry system. + +## Tool Manager + +The `ToolManager` class is responsible for managing and executing tools. It handles tool initialization, execution, and schema generation. + +### Initialization +You can initialize the `ToolManager` with multiple ways: +- Initialize with a list of tool class names or instances or configurations. + ```python + tool_manager = ToolManager(tools=["Calculator"]) + tool_manager = ToolManager(tools=[Calculator()]) + tool_manager = ToolManager(tools=[{"name": "Calculator", "description": "Calculator tool."}]) + ``` +- Initialize with a dictionary of key-value pairs, where the key is the tool name and the value is the tool instance or configuration. +```python +tool_manager = ToolManager(tools={"my_calculator": Calculator()}) +tool_manager = ToolManager(tools={"my_calculator": {"name": "Calculator", "description": "Calculator tool."}}) +``` +Also, you can initialize the `ToolManager` with a yaml file. The ToolManager will be instantiated when building a worker. +```yaml +tools: + - Calculator + - CodeInterpreter + - ReadFileContent + - WriteFileContent + - ShellTool + - name: WebSearch + bing_api_key: ${env|bing_api_key, microsoft_bing_api_key} + llm: ${sub|text_res} + +``` +If you want the ToolManger to decide which tool to use and generate the corresponding inputs, you should also provide a llm with prompts to the ToolManager. + +### Execution +Tools can be invoked using the `ToolManager`. Here's how to execute a tool with a given tool name and arguments: + +```python +tool_manager = ToolManager() +result = tool_manager.execute("Calculator", {"code": "print(2 + 3)"}) +print(result) +``` +The ```ToolManager``` will retrieve the corresponding tool, validate the input arguments and execute the tool. +Another way to execute a tool is use the `execute_task` method. You can provide a task and let the ToolManager decide which tool to use and generate the corresponding inputs. +```python +tool_manager = ToolManager() +result = tool_manager.execute_task("Calculate the result of 2 + 3.") +``` \ No newline at end of file diff --git a/docs/en/Core Concepts/Workflow/build.md b/docs/en/core_concepts/Workflow/build.md similarity index 100% rename from docs/en/Core Concepts/Workflow/build.md rename to docs/en/core_concepts/Workflow/build.md diff --git a/docs/en/Core Concepts/Workflow/config.md b/docs/en/core_concepts/Workflow/config.md similarity index 100% rename from docs/en/Core Concepts/Workflow/config.md rename to docs/en/core_concepts/Workflow/config.md diff --git a/docs/en/Core Concepts/Workflow/intro.md b/docs/en/core_concepts/Workflow/intro.md similarity index 100% rename from docs/en/Core Concepts/Workflow/intro.md rename to docs/en/core_concepts/Workflow/intro.md diff --git a/docs/en/core_concepts/Workflow/task.md b/docs/en/core_concepts/Workflow/task.md new file mode 100644 index 0000000..ed22f9a --- /dev/null +++ b/docs/en/core_concepts/Workflow/task.md @@ -0,0 +1,157 @@ +# Task + +Task is the basic unit of building workflow. There are two types of tasks: simple task and operator. + +## Simple Task +The functionality of simple task is defined by binding it to a [worker](./worker.md). +Here is an example of how to define a simple task: +```python +from omagent_core.engine.worker.base import BaseWorker +from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow +from omagent_core.engine.workflow.task.simple_task import simple_task +from omagent_core.utils.registry import registry + +# Define a worker +@registry.register_worker() +class SimpleWorker(BaseWorker): + def _run(self, my_name: str): + return {} + +# Define a workflow +workflow = ConductorWorkflow(name='my_exp') + +# Define a simple task +task = simple_task(task_def_name='SimpleWorker', task_reference_name='ref_name', inputs={'my_name': workflow.input('my_name')}) + +workflow >> task +``` +Specify the task definition name(```task_def_name```) and the task reference name(```task_reference_name```). The task definition name should be the name of the corresponding worker class. The task reference name is used to identify the task in the workflow. +Specify the inputs of the task. Inputs may be either values or references to a workflow's initial inputs or the outputs of preceding tasks. +See [workflow](./workflow.md) for workflow details. + +## Operators +Operators are the build-in tasks provided by the workflow engine. They handle the workflow control logic. +### 1. Switch Task +Switch task is used to make a decision based on the value of a given field. +```python +from omagent_core.engine.workflow.task.switch_task import SwitchTask +from omagent_core.engine.worker.base import BaseWorker +from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow +from omagent_core.engine.workflow.task.simple_task import simple_task +from omagent_core.utils.registry import registry + +@registry.register_worker() +class SimpleWorker1(BaseWorker): + def _run(self): + print('worker1') + return {} + +@registry.register_worker() +class SimpleWorker2(BaseWorker): + def _run(self): + print('worker2') + return {} + +@registry.register_worker() +class SimpleWorker3(BaseWorker): + def _run(self): + print('worker3') + return {} + +workflow = ConductorWorkflow(name='switch_test') + +# Create some example tasks (replace with your actual tasks) +task1 = simple_task(task_def_name='SimpleWorker1', task_reference_name='ref_name1') +task2 = simple_task(task_def_name='SimpleWorker2', task_reference_name='ref_name2') +task3 = simple_task(task_def_name='SimpleWorker3', task_reference_name='ref_name3') + +# 1. Create a switch task with a value-based condition +switch = SwitchTask( + task_ref_name="my_switch", + case_expression=workflow.input('switch_case_value'), # This will evaluate the switch_case_value from workflow input +) + +# 2. Add cases +switch.switch_case("w1", [task1]) +switch.switch_case("w2", [task2]) + +# 3. Add default case (optional) +switch.default_case([task3]) + +workflow >> switch + +workflow.register(overwrite=True) +``` +This will create a basic workflow with a switch task shown below. (You can check the workflow definition at Conductor UI default at http://localhost:5001/workflowDefs). ++ +
+You can also chaining the switch cases as follows: + +```python +switch.switch_case("w1", [task1]).switch_case("w2", [task2]).default_case([task3]) +``` + +### 2. Fork-Join Task +The fork-join task is used to execute multiple tasks in parallel. +```python +from omagent_core.engine.workflow.task.fork_task import ForkTask +from omagent_core.engine.worker.base import BaseWorker +from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow +from omagent_core.engine.workflow.task.simple_task import simple_task +from omagent_core.utils.registry import registry + + +@registry.register_worker() +class SimpleWorker1(BaseWorker): + def _run(self): + print("worker1") + return {} + + +@registry.register_worker() +class SimpleWorker2(BaseWorker): + def _run(self): + print("worker2") + return {} + + +@registry.register_worker() +class SimpleWorker3(BaseWorker): + def _run(self): + print("worker3") + return {} + + +# Create the main workflow +workflow = ConductorWorkflow(name="fork_join_test") + +# Create tasks for parallel execution +task1 = simple_task(task_def_name="SimpleWorker1", task_reference_name="parallel_task1") +task2 = simple_task(task_def_name="SimpleWorker2", task_reference_name="parallel_task2") +task3 = simple_task(task_def_name="SimpleWorker3", task_reference_name="parallel_task3") + +# Create parallel execution paths +path1 = [task1] # First parallel path +path2 = [task2] # Second parallel path +path3 = [task3] # Third parallel path + +# Create the fork task with multiple parallel paths +fork_task = ForkTask( + task_ref_name="parallel_execution", + forked_tasks=[path1, path2, path3], + # The join will wait for the last task in each path + join_on=["parallel_task1", "parallel_task2", "parallel_task3"] +) + +# Add the fork task to the workflow +workflow.add(fork_task) + +workflow.register(overwrite=True) +``` +This will create a basic workflow with a fork-join task shown below. ++ +
+ +### 3. Do-While Task diff --git a/docs/en/core_concepts/Workflow/worker.md b/docs/en/core_concepts/Workflow/worker.md new file mode 100644 index 0000000..cdceb87 --- /dev/null +++ b/docs/en/core_concepts/Workflow/worker.md @@ -0,0 +1,172 @@ +# Worker +Worker is the basic unit of computation in OmAgent. It is responsible for executing tasks and generating outputs. +## How to define a worker +The most basic worker can be created like this: +```python +from omagent_core.engine.worker.base import BaseWorker +from omagent_core.utils.registry import registry + +@registry.register_worker() +class MyWorker(BaseWorker): + def _run(self, *args, **kwargs): + # Implement your business logic here + return {"result": "some_value"} +``` +By inheriting from `BaseWorker`, you can define your own worker. The worker will be registered with the name of the class. Normally, use `@registry.register_worker()` to register the worker so that it can build from configurations. See [registry](./registry.md) for more details. + +### 1. Parameter Handling +You can define typed parameters that are json serializable, and return in key-value format: +```python +@registry.register_worker() +class ParameterWorker(BaseWorker): + def _run(self, name: str, age: int): + # Parameters will be automatically extracted + return { + "message": f"Hello {name}, you are {age} years old" + } +``` + +### 2. Integration +You can integrate workers with other libraries to extend the functionality. A most common case is to integrate with LLMs. Here is an example of how: +```python +@registry.register_worker() +class LLMWorker(BaseLLMBackend, BaseWorker) + llm: OpenaiGPTLLM + output_parser: StrParser + prompts: List[PromptTemplate] = Field( + default=[ + PromptTemplate.from_template(template="Your prompt here") + ] + ) + + def _run(self, *args, **kwargs): + return self.simple_infer() +``` + +### 3. Configuration Fields +You can configure worker behavior using Pydantic Fields to set default values: +```python +@registry.register_worker() +class ConfigurableWorker(BaseWorker): + poll_interval: float = Field(default=100) # Polling interval in milliseconds + domain: str = Field(default=None) # Workflow domain + concurrency: int = Field(default=5) # Concurrency level +``` +Note: do not use ```alias``` in the field definition. + +### 4. Async Support +Workers can be asynchronous: +```python +@registry.register_worker() +class AsyncWorker(BaseWorker): + async def _run(self, *args, **kwargs): + async def count_task(i): + await asyncio.sleep(1) + print(f'Task {i} completed!') + return i + + tasks = [count_task(i) for i in range(10)] + results = await asyncio.gather(*tasks) + return {"result": "async operation completed"} +``` + +## Configuration and build +Workers can be configured and built from YAML or JSON configuration files. You not only can set the parameters, but the recursive dependencies. +### 1. Worker Configuration Structure +Here's the basic structure: +```yaml +name: LLMWorker +llm: + name: OpenaiGPTLLM + model_id: gpt-4o + api_key: sk-proj-... + endpoint: https://api.openai.com/v1 + temperature: 0 + vision: true +output_parser: + name: StrParser +``` +### 2. Submodule Substitution +You can use the ${sub|**module_name**} to substitute submodules. This is useful when you want to reuse the same submodule in different workers and also keep the configuration clean. The **module_name** should be the name of the submodule configuration file. +For example, you can define the llm_worker.yaml as follows: +```yaml +name: LLMWorker +llm: ${sub|gpt} +output_parser: + name: StrParser +``` +And define the gpt.yaml as follows: +```yaml +name: OpenaiGPTLLM +model_id: gpt-4o +api_key: sk-proj-... +endpoint: https://api.openai.com/v1 +temperature: 0 +vision: true +``` +This is equivalent to the previous LLMWorker example. +Note: Do not use ```alias``` in the field definition. Do not create Circular reference. + +### 3. Environment Variables +You can use the ${env|**env_name**, **default_value**} to substitute environment variables. This is useful when you want to set the parameters dynamically. The **env_name** should be the name of the environment variable. **default_value** is optional, and will be used when the environment variable is not set. +For example, you can define the gpt.yaml as follows: +```yaml +name: OpenaiGPTLLM +model_id: gpt-4o +api_key: ${env| CUSTOM_OPENAI_KEY} +endpoint: ${env| CUSTOM_OPENAI_ENDPOINT, https://api.openai.com/v1} +temperature: 0 +vision: true +``` +The environment variable name is case-sensitive. + +### 4. Default Configuration Fields +Workers have several default configuration fields that can be set: +- **component_stm**: The STM component for the worker. Use any registered component name. Default is the one registered with `register_stm`. Access it via `self.stm`. See [container](./container.md) and [memory](./memory.md) for more details. +- **component_ltm**: The LTM component for the worker. Use any registered component name. Default is the one registered with `register_ltm`. Access it via `self.ltm`. See [container](./container.md) and [memory](./memory.md) for more details. +- **component_callback**: The callback component for the worker. Use any registered component name. Default is the one registered with `register_callback`. Access it via `self.callback`. See [container](./container.md) and [client](./client.md) for more details. +- **component_input**: The input component for the worker. Use any registered component name. Default is the one registered with `register_input`. Access it via `self.input`. See [container](./container.md) and [client](./client.md) for more details. +- **poll_interval**: The poll interval for the worker. Default is 100 milliseconds. +- **domain**: The domain of the workflow. Default is None. +- **concurrency**: The concurrency of the worker. Default is 5. + +### 5. Build from Configurations +The worker instances can be built from configurations by using the ```build_from_file``` function from omagent_core.utils.build. Here's how it works: +```python +from omagent_core.utils.build import build_from_file + +# Load worker configs from a directory +worker_config = build_from_file('path/to/config/directory') +``` +Note: You must provide a ```workers``` directory in the configuration path which contains all configurations for the workers. + +## Run workers +OmAgent provides a TaskHandler class to manage worker instance creation and management. Here's how to use TaskHandler: +```python +from omagent_core.engine.automator.task_handler import TaskHandler + +task_handler = TaskHandler(worker_config=worker_config, workers=[MyWorker()]) +task_handler.start_processes() +task_handler.stop_processes() +``` +The `worker_config` parameter accepts a set of worker configurations and launches the corresponding number of processes based on each worker's concurrency attribute value. + +You can also use the `workers` parameter to directly pass in instantiated worker objects. Instances of these workers are deepcopied based on the concurrency setting. If your worker instances contain objects that cannot be deepcopied, set the instance's concurrency property to 1 and actively expand the concurrency count in the workers list. + +Then, use `start_processes` to start all workers and `stop_processes` to stop all workers. + +## Important Notes +- Always use the @registry.register_worker() decorator to register the worker +- The ```_run``` method is mandatory and contains your core logic +- Return values should be a dictionary with serializable values +- Worker behavior can be configured using Fields +- Both synchronous and asynchronous operations are supported +- The ```self.workflow_instance_id``` is automatically available in the worker context + +## Best Practices +- Keep workers focused on a single responsibility +- Use proper type hints for better code clarity +- Implement proper error handling +- Document your worker's expected inputs and outputs +- Use configuration fields for flexible behavior +- Consider using async operations for I/O-bound tasks diff --git a/docs/en/core_concepts/Workflow/workflow.md b/docs/en/core_concepts/Workflow/workflow.md new file mode 100644 index 0000000..e2a5734 --- /dev/null +++ b/docs/en/core_concepts/Workflow/workflow.md @@ -0,0 +1,59 @@ +# Workflow +Workflow is the top-level object in Omagent. It contains a list of tasks and the dependencies between them. + +## Creating a Workflow +You can create a workflow by instantiating the `ConductorWorkflow` class. +```python +from omagent_core.engine.workflow.conductor_workflow import ConductorWorkflow + +workflow = ConductorWorkflow(name='test_workflow') +``` + +## Adding Tasks to a Workflow +You can add tasks to a workflow by using ```add``` method. (See [task](./task.md) for more details about tasks) +```python +workflow.add(task) +``` +There is a shortcut operator `>>` for this method. +```python +workflow >> task +``` +Also, you can chaining the tasks as follows: +```python +workflow >> task1 >> task2 >> task3 +``` +There is a simple way to create fork-join tasks. +```python +workflow >> task1 >> [task2, task3, task4] >> task5 +``` +There is also a simple way to define a switch task. +```python +workflow >> switch_task >> {'case1': task1, 'case2': task2, 'default': task3} # default is for a scenario that the result does not correspond to any specified case +``` +Note that the switch_task **MUST** output ```switch_case_value``` as indicator for branching. + +You can use a workflow as a task in another workflow. +```python +sub_workflow >> task1 >> task2 +workflow >> task3 >> sub_workflow >> task4 +``` + +## Registering a Workflow +You can register a workflow by using ```register``` method. +```python +workflow.register(overwrite=True) +``` +After registering, you can see the workflow in the Conductor UI (default at http://localhost:5001/workflowDefs). + +## Running a Workflow +You can start a workflow instance and send input to it by using ```start_workflow_with_input``` method. +```python +workflow_execution_id = workflow.start_workflow_with_input(workflow_input={'name': 'Lu'}) +``` + +## Getting Workflow Status and Result +Since the workflow is a async task, you can get its status and result by using ```get_workflow``` method. +```python +status = workflow.get_workflow(workflow_id=workflow_execution_id).status +result = workflow.get_workflow(workflow_id=workflow_execution_id).output +``` \ No newline at end of file diff --git a/docs/en/core_concepts/container.md b/docs/en/core_concepts/container.md new file mode 100644 index 0000000..230be19 --- /dev/null +++ b/docs/en/core_concepts/container.md @@ -0,0 +1,92 @@ +# Container + +The Container module is a dependency injection and service container implementation that manages components and their dependencies in the OmAgent core system. It follows the IoC (Inversion of Control) pattern to handle component registration, configuration, and retrieval. + +## Key Features +### 1. Component Management +- Registers and manages different types of components (connections, memories, callbacks, etc.) +- Handles component dependencies automatically +- Provides type-safe access to registered components + +### 2. Connector Management +Manages service connectors that components might depend on +- Automatically injects required connectors into components + +### 3. Special Component Types +- STM (Short-term Memory) +- LTM (Long-term Memory) +- Callback handlers +- Input handlers + +### 4. Configuration Management +- Can compile configurations to YAML +- Loads configurations from YAML files +- Supports environment variables and descriptions in configs + + +## Register +Examples of registering: +```python +from omagent_core.utils.container import container +from omagent_core.services.handlers.redis_stream_handler import RedisStreamHandler + +# Register a connector using component name +container.register_connector(RedisConnector, name="redis_client") + +# Register a component using component class +container.register_component(RedisStreamHandler) + +# Register STM component +container.register_stm("RedisSTM") + +# Register LTM component +container.register_ltm("MilvusLTM") + +# Register callback and input handlers +container.register_callback("AppCallback") +container.register_input("AppInput") +``` + +## Configuration Management + +1. **Compile Configuration**: Container can automatically generate YAML configuration template files. You can change the values of the parameters in the template files which will take effect when loading the configuration. The ```env_var``` indicates the environment variable names for the parameters, don't change it because it is just for demonstration. + ```python + from pathlib import Path + container.compile_config(Path('./config_dir')) + ``` + + + +2. **Load Configuration**: Load settings from YAML files. This will update the container with the settings in the YAML file. + ```python + container.from_config('container.yaml') + ``` + +## Component Retrieval + +Access registered components: +```python +# Get a connector +redis_client = container.get_connector("redis_client") + +# Get STM component +stm = container.stm + +# Get LTM component +ltm = container.ltm + +# Get callback handler +callback = container.callback + +# Get input handler +input_handler = container.input +``` + + +## Best Practices + +1. **Early Registration**: Register all components at application startup +2. **Configuration Files**: Use YAML configuration files for better maintainability +3. **Compile Configuration**: Prepare a separated script to compile container configuration before application startup. +4. **Update Container**: Update the container with the settings in project entry file. Do register default Special Components (STM, LTM, Callback, Input) before update. +5. **Single Instance**: Use the global container instance provided by the framework \ No newline at end of file diff --git a/docs/en/core_concepts/debug.md b/docs/en/core_concepts/debug.md new file mode 100644 index 0000000..2955029 --- /dev/null +++ b/docs/en/core_concepts/debug.md @@ -0,0 +1,5 @@ +Debug Mode: +Set `debug: true` in the `conductor_config` section within `container.yaml` to enable debug mode. The debug mode has the following features: +1. Outputs more debug information. +2. After starting, it will stop all workflows with the same name on the conductor and restart a new workflow. +3. There will be no retries after failure, each task will only be executed once. \ No newline at end of file diff --git a/docs/en/core_concepts/registry.md b/docs/en/core_concepts/registry.md new file mode 100644 index 0000000..f9c829a --- /dev/null +++ b/docs/en/core_concepts/registry.md @@ -0,0 +1,53 @@ +# Register + +The Registry module is a powerful tool for managing and organizing different types of modules in your application. It supports registration and retrieval of various categories like prompts, LLMs, workers, tools, encoders, connectors, and components. + +## Registration +You can register classes using either decorators or direct registration: +```python +from omagent_core.utils.registry import registry + +# Using decorator (recommended) +@registry.register_node() +class MyNode: + name = "MyNode" + +# Or with a custom name +@registry.register_tool(name="custom_tool_name") +class MyTool: + pass + +# Direct registration +class MyLLM: + pass +registry.register("llm", "my_llm")(MyLLM) +``` + + +## Retrieval +Retrieve registered modules using the get methods: +```python +# Get registered modules +my_node = registry.get_node("MyNode") +my_tool = registry.get_tool("custom_tool_name") +my_llm = registry.get("llm", "my_llm") +``` + + +## Auto-Import Feature +The registry can automatically import modules from specified paths: +```python +# Import from default paths +registry.import_module() + +# Import from custom project path +registry.import_module("path/to/your/modules") + +# Import from multiple paths +registry.import_module([ + "path/to/modules1", + "path/to/modules2" +]) +``` +Note: Do use the ```registry.import_module()``` in the main function of your script so that the modules can be registered to python environment before being used. + diff --git a/docs/en/Getting Started/2_devices.md b/docs/en/getting_started/2_devices.md similarity index 100% rename from docs/en/Getting Started/2_devices.md rename to docs/en/getting_started/2_devices.md diff --git a/docs/en/getting_started/install.md b/docs/en/getting_started/install.md new file mode 100644 index 0000000..5b4dca4 --- /dev/null +++ b/docs/en/getting_started/install.md @@ -0,0 +1,44 @@ +# Installation + +### 1. Deploy the Workflow Orchestration Engine +OmAgent utilizes [Conductor](https://github.com/conductor-oss/conductor) as its workflow orchestration engine. Conductor is an open-source, distributed, and scalable workflow engine that supports a variety of programming languages and frameworks. By default, it uses Redis for persistence and Elasticsearch (7.x) as the indexing backend. +It is recommended to deploy Conductor using Docker: +```bash +docker-compose -f docker/conductor/docker-compose.yml up -d +``` +- Once deployed, you can access the Conductor UI at `http://localhost:5001`. (Note: Mac system will occupy port 5000 by default, so we use 5001 here. You can specify other ports when deploying Conductor.) +- The Conductor API can be accessed via `http://localhost:8080`. +- More details about the deployment can be found [here](docker/README.md). + +### 2. Install OmAgent +- **Python Version**: Ensure Python 3.10 or higher is installed. +- **Install `omagent_core`**: + ```bash + pip install -e omagent-core + ``` +- **Install dependencies for the sample project**: + ```bash + pip install -r requirements.txt + ``` + +- **Install Optional Components**: + - Install Milvus VectorDB for enhanced support of long-term memory. +OmAgent uses Milvus Lite as the default vector database for storing vector data related to long-term memory. To utilize the full Milvus service, you may deploy the [Milvus vector database](https://milvus.io/docs/install_standalone-docker.md) via Docker. + - Pull git lfs files. +We provide sample image files for our examples in the `examples/step4_outfit_with_ltm/wardrobe_images` directory. To use them, ensure Git LFS is installed. You can install it with the following command: + ```bash + git lfs install + ``` + Then, pull the files by executing: + ```bash + git lfs pull + ``` + + + +### 3. Connect Devices +If you wish to use smart devices to access your agents, we provide a smartphone app and corresponding backend, allowing you to focus on agent functionality without worrying about complex device connection issues. +- **Deploy the app backend** + The APP backend comprises the backend program, along with two middleware components: the MySQL database and MinIO object storage. For installation and deployment instructions, please refer to [this link](https://github.com/om-ai-lab/OmAgent/blob/main/docker/README.md). +- **Download, install, and debug the smartphone app** + At present, we offer an Android APP available for download and testing. For detailed instructions on acquiring and using it, please refer to [here](../core_concepts/Clients/app.md). The iOS version is currently under development and will be available soon. \ No newline at end of file diff --git a/docs/en/getting_started/intro.md b/docs/en/getting_started/intro.md new file mode 100644 index 0000000..098120f --- /dev/null +++ b/docs/en/getting_started/intro.md @@ -0,0 +1,44 @@ +# Introduction +OmAgent is an open-source agent framework designed to streamlines the development of on-device multimodal agents. Our goal is to enable agents that can empower various hardware devices, ranging from smart phone, smart wearables (e.g. glasses), IP cameras to futuristic robots. As a result, OmAgent creates an abstraction over various types of device and simplifies the process of connecting these devices to the state-of-the-art multimodal foundation models and agent algorithms, to allow everyone build the most interesting on-device agents. Moreover, OmAgent focuses on optimize the end-to-end computing pipeline, on in order to provides the most real-time user interaction experience out of the box. + +In conclusion, key features of OmAgent include: + +- **Easy Connection to Diverse Devices**: we make it really simple to connect to physical devices, e.g. phone, glasses and more, so that agent/model developers can build the applications that not running on web page, but running on devices. We welcome contribution to support more devices! + +- **Speed-optimized SOTA Mutlimodal Models**: OmAgent integrates the SOTA commercial and open-source foundation models to provide application developers the most powerful intelligence. Moreover, OmAgent streamlines the audio/video processing and computing process to easily enable natural and fluid interaction between the device and the users. + +- **SOTA Multimodal Agent Algorithms**: OmAgent provides an easy workflow orchestration interface for researchers and developers implement the latest agent algorithms, e.g. ReAct, DnC and more. We welcome contributions of any new agent algorithm to enable more complex problem solving abilities. + +- **Scalability and Flexibility**: OmAgent provides an intuitive interface for building scalable agents, enabling developers to construct agents tailored to specific roles and highly adaptive to various applications. + +## Architecture +The design architecture of OmAgent adheres to three fundamental principles: +1. Graph-based workflow orchestration; +2. Native multimodality; +3. Device-centricity. + +With OmAgent, one has the opportunity to craft a bespoke intelligent agent program. + +For a deeper comprehension of OmAgent, let us elucidate key terms: + ++ +
+ +- **Devices**: Central to OmAgent's vision is the empowerment of intelligent hardware devices through artificial intelligence agents, rendering devices a pivotal component of OmAgent's essence. By leveraging the downloadable mobile application we have generously provided, your mobile device can become the inaugural foundational node linked to OmAgent. Devices serve to intake environmental stimuli, such as images and sounds, potentially offering responsive feedback. We have evolved a streamlined backend process to manage the app-centric business logic, thereby enabling developers to concentrate on constructing the intelligence agent's logical framework. See [client](../core_concepts/Clients/client.md) for more details. + +- **Workflow**: Within the OmAgent Framework, the architectural structure of intelligent agents is articulated through graphs. Developers possess the liberty to innovate, configure, and sequence node functionalities at will. Presently, we have opted for Conductor as the workflow orchestration engine, lending support to intricate operations like switch-case, fork-join, and do-while. See [workflow](../core_concepts/Workflow/workflow.md) for more details. + +- **Task and Worker**: Throughout the OmAgent workflow development journey, Task and Worker stand as pivotal concepts. Worker embodies the actual operational logic of workflow nodes, whereas Task oversees the orchestration of the workflow's logic. Tasks are categorized into Operators, managing workflow logic (e.g., looping, branching), and Simple Tasks, representing nodes customized by developers. Each Simple Task is correlated with a Worker; when the workflow progresses to a given Simple Task, the task is dispatched to the corresponding worker for execution. See [task](../core_concepts/Workflow/task.md) and [worker](../core_concepts/Workflow/worker.md) for more details. + + +## Basic Principles of Building an Agent +- **Modularity**: Break down the agent's functionality into discrete workers, each responsible for a specific task. + +- **Reusability**: Design workers to be reusable across different workflows and agents. + +- **Scalability**: Use workflows to scale the agent's capabilities by adding more workers or adjusting the workflow sequence. + +- **Interoperability**: Workers can interact with various backends, such as LLMs, databases, or APIs, allowing agents to perform complex operations. + +- **Asynchronous Execution**: The workflow engine and task handler manage the execution asynchronously, enabling efficient resource utilization. \ No newline at end of file diff --git a/docs/en/getting_started/quick_start.md b/docs/en/getting_started/quick_start.md new file mode 100644 index 0000000..c93276e --- /dev/null +++ b/docs/en/getting_started/quick_start.md @@ -0,0 +1,49 @@ +# Quick Start + +## 1、Configuration + +The container.yaml file is a configuration file that manages dependencies and settings for different components of the system. To set up your configuration: + +1. Generate the container.yaml file: + ```bash + cd examples/step2_outfit_with_switch + python compile_container.py + ``` + This will create a container.yaml file with default settings under `examples/step2_outfit_with_switch`. + + + +2. Configure your LLM settings in `configs/llms/gpt.yml` and `configs/llms/text_res.yml`: + + - Set your OpenAI API key or compatible endpoint through environment variable or by directly modifying the yml file + ```bash + export custom_openai_key="your_openai_api_key" + export custom_openai_endpoint="your_openai_endpoint" + ``` + +3. Update settings in the generated `container.yaml`: + - Configure Redis connection settings, including host, port, credentials, and both `redis_stream_client` and `redis_stm_client` sections. + - Update the Conductor server URL under conductor_config section + - Adjust any other component settings as needed + +4. Websearch uses duckduckgo by default. For better results, it is recommended to configure [Bing Search](https://www.microsoft.com/en-us/bing/apis/pricing) by modifying the `configs/tools/websearch.yml` file and setting the `bing_api_key`. + +For more information about the container.yaml configuration, please refer to the [container module](../core_concepts/container.md) + +## 2、Running the Example + +1. Run the outfit with switch example: + + For terminal/CLI usage: Input and output are in the terminal window + ```bash + cd examples/step2_outfit_with_switch + python run_cli.py + ``` + + For app/GUI usage: Input and output are in the app + ```bash + cd examples/step2_outfit_with_switch + python run_app.py + ``` + For app backend deployment, please refer to [here]([docker/README.md](https://github.com/om-ai-lab/OmAgent/blob/main/docker/README.md)) + For the connection and usage of the OmAgent app, please check [app usage documentation](../core_concepts/Clients/app.md) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index ddef988..77e6ce5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,80 +32,94 @@ plugins: nav: - Home: index.md - Getting Started: - - Quick Start: Getting Started/1_quick_start.md - - Connect to device: Getting Started/2_devices.md - - Visualization and analysis: Getting Started/3_visual.md + - Introduction: getting_started/intro.md + - Installation: getting_started/install.md + - Quick Start: getting_started/quick_start.md + # - Visualization and analysis: Getting Started/3_visual.md - Guides: - - Let the agent think: - - More complex workflow: Guides/Let the agent think/1_workflow.md - - Tools: Guides/Let the agent think/2_tool.md - - Let the agent remember: - - Memory: Guides/Let the agent remember/1_memory.md + # - Let the agent think: + # - More complex workflow: Guides/Let the agent think/1_workflow.md + # - Tools: Guides/Let the agent think/2_tool.md + # - Let the agent remember: + # - Memory: Guides/Let the agent remember/1_memory.md + - A Simple VQA Agent: Guides/simple_vqa.md + - Let Agent Decide: Guides/outfit_with_switch.md + - How To Loop: Guides/outfit_with_loop.md + - Let Agent Remember: Guides/outfit_with_ltm.md - Core Concepts: - - Node: - - Introduction: Core Concepts/Node/intro.md - - Configuration: Core Concepts/Node/config.md - - Build: Core Concepts/Node/build.md + # - Node: + # - Introduction: core_concepts/Node/intro.md + # - Configuration: core_concepts/Node/config.md + # - Build: core_concepts/Node/build.md - Workflow: - - Introduction: Core Concepts/Workflow/intro.md - - Configuration: Core Concepts/Workflow/config.md - - Build: Core Concepts/Workflow/build.md + # - Introduction: core_concepts/Workflow/intro.md + # - Configuration: core_concepts/Workflow/config.md + # - Build: core_concepts/Workflow/build.md + - Workflow: core_concepts/Workflow/workflow.md + - Task: core_concepts/Workflow/task.md + - Worker: core_concepts/Workflow/worker.md - Models: - - Introduction: Core Concepts/Models/intro.md - - Configuration: Core Concepts/Models/config.md - - Build: Core Concepts/Models/build.md + # - Introduction: core_concepts/Models/intro.md + # - Configuration: core_concepts/Models/config.md + # - Build: core_concepts/Models/build.md + - Large Languages Models: core_concepts/Models/llms.md - Tool: - - Introduction: Core Concepts/Tool/intro.md - - Configuration: Core Concepts/Tool/config.md - - Build: Core Concepts/Tool/build.md + - Introduction: core_concepts/Tool/intro.md + # - Configuration: core_concepts/Tool/config.md + # - Build: core_concepts/Tool/build.md - Memory: - - Introduction: Core Concepts/Memory/intro.md - - Configuration: Core Concepts/Memory/config.md - - Build: Core Concepts/Memory/build.md + - Introduction: core_concepts/Memory/intro.md + # - Configuration: core_concepts/Memory/config.md + # - Build: core_concepts/Memory/build.md - Clients: - - Introduction: Core Concepts/Clients/intro.md - - Configuration: Core Concepts/Clients/config.md - - Build: Core Concepts/Clients/build.md - - locale: zh - name: 中文 - build: true - nav: - - 首页: index.md - - 开始: - - 快速开始: Getting Started/1_quick_start.md - - 连接设备: Getting Started/2_devices.md - - 可视化与分析: Getting Started/3_visual.md - - 指南: - - 让Agent思考: - - 更复杂的流程: Guides/Let the agent think/1_workflow.md - - 使用工具: Guides/Let the agent think/2_tool.md - - 让Agent记住: - - 记忆: Guides/Let the agent remember/1_memory.md - - 核心概念: - - Node: - - 介绍: Core Concepts/Node/intro.md - - 配置: Core Concepts/Node/config.md - - 构建: Core Concepts/Node/build.md - - Workflow: - - 介绍: Core Concepts/Workflow/intro.md - - 配置: Core Concepts/Workflow/config.md - - 构建: Core Concepts/Workflow/build.md - - Models: - - 介绍: Core Concepts/Models/intro.md - - 配置: Core Concepts/Models/config.md - - 构建: Core Concepts/Models/build.md - - Tool: - - 介绍: Core Concepts/Tool/intro.md - - 配置: Core Concepts/Tool/config.md - - 构建: Core Concepts/Tool/build.md - - Memory: - - 介绍: Core Concepts/Memory/intro.md - - 配置: Core Concepts/Memory/config.md - - 构建: Core Concepts/Memory/build.md - - Clients: - - 介绍: Core Concepts/Clients/intro.md - - 配置: Core Concepts/Clients/config.md - - 构建: Core Concepts/Clients/build.md + # - Introduction: core_concepts/Clients/intro.md + # - Configuration: core_concepts/Clients/config.md + # - Build: core_concepts/Clients/build.md + - Client Types: core_concepts/Clients/client.md + - How to use app: core_concepts/Clients/app.md + - Container: core_concepts/container.md + - Register Modules: core_concepts/registry.md + - Running In Debug Mode: core_concepts/debug.md + # - locale: zh + # name: 中文 + # build: true + # nav: + # - 首页: index.md + # - 开始: + # - 快速开始: Getting Started/1_quick_start.md + # - 连接设备: Getting Started/2_devices.md + # - 可视化与分析: Getting Started/3_visual.md + # - 指南: + # - 让Agent思考: + # - 更复杂的流程: Guides/Let the agent think/1_workflow.md + # - 使用工具: Guides/Let the agent think/2_tool.md + # - 让Agent记住: + # - 记忆: Guides/Let the agent remember/1_memory.md + # - 核心概念: + # - Node: + # - 介绍: core_concepts/Node/intro.md + # - 配置: core_concepts/Node/config.md + # - 构建: core_concepts/Node/build.md + # - Workflow: + # - 介绍: core_concepts/Workflow/intro.md + # - 配置: core_concepts/Workflow/config.md + # - 构建: core_concepts/Workflow/build.md + # - Models: + # - 介绍: core_concepts/Models/intro.md + # - 配置: core_concepts/Models/config.md + # - 构建: core_concepts/Models/build.md + # - Tool: + # - 介绍: core_concepts/Tool/intro.md + # - 配置: core_concepts/Tool/config.md + # - 构建: core_concepts/Tool/build.md + # - Memory: + # - 介绍: core_concepts/Memory/intro.md + # - 配置: core_concepts/Memory/config.md + # - 构建: core_concepts/Memory/build.md + # - Clients: + # - 介绍: core_concepts/Clients/intro.md + # - 配置: core_concepts/Clients/config.md + # - 构建: core_concepts/Clients/build.md - search: # 搜索 lang: - en