-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vision capability #2025
Add vision capability #2025
Conversation
@rickyloynd-microsoft @sonichi @ekzhu @afourney For instance, bounding box algorithm can be incorporated here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@BeibinLi @WaelKarkoub @rickyloynd-microsoft shall we create a multimodal topic under https://microsoft.github.io/autogen/docs/topics ? |
* Add vision capability * Configurate: description_prompt * Print warning instead of raising issues for type * Skip vision capability test if dependencies not installed * Append "vision" to agent's system message when enabled VisionCapability * GPT-4V notebook update with ConversableAgent * Clean GPT-4V notebook * Add vision capability test to workflow * Lint import * Update system message for vision capability * Add a `custom_caption_func` to VisionCapability * Add custom function example for vision capability * Skip test Vision capability custom func * GPT-4V notebook metadata to website * Remove redundant files * The custom caption function takes more inputs now * Add a more complex example of custom caption func * Remove trailing space --------- Co-authored-by: Chi Wang <[email protected]>
* DOC FIX - Formatted Docstrings for the retrieve_user_proxy_agent.py and Added first single line for the class RetrieveUserProxyAgent. * DOC FIX - Formatted Docstrings for theinitiate_chats functiion of ChatResult class in autogen/agentchat/chat.py * Add vision capability (#2025) * Add vision capability * Configurate: description_prompt * Print warning instead of raising issues for type * Skip vision capability test if dependencies not installed * Append "vision" to agent's system message when enabled VisionCapability * GPT-4V notebook update with ConversableAgent * Clean GPT-4V notebook * Add vision capability test to workflow * Lint import * Update system message for vision capability * Add a `custom_caption_func` to VisionCapability * Add custom function example for vision capability * Skip test Vision capability custom func * GPT-4V notebook metadata to website * Remove redundant files * The custom caption function takes more inputs now * Add a more complex example of custom caption func * Remove trailing space --------- Co-authored-by: Chi Wang <[email protected]> * Native tool call support for Mistral AI API and topic notebook. (#2135) * Support for Mistral AI API and topic notebook. * formatting * formatting * New conversational chess notebook using nested chats and tool use (#2137) * add chess notebook * update * update * Update notebook with figure * Add example link * redirect * Clean up example format * address gagan's comments * update references * fix links * add webarena in samples (#2114) * add webarena in samples/tools * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * update installation instructions * black formatting * Update README.md --------- Co-authored-by: gagb <[email protected]> Co-authored-by: Eric Zhu <[email protected]> * context to kwargs (#2064) * context to kwargs * add tag * add test * text to kwargs --------- Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: Chi Wang <[email protected]> * Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /website (#2131) Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.3 to 5.3.4. - [Release notes](https://github.com/webpack/webpack-dev-middleware/releases) - [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md) - [Commits](webpack/webpack-dev-middleware@v5.3.3...v5.3.4) --- updated-dependencies: - dependency-name: webpack-dev-middleware dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Zhu <[email protected]> * Parse Any HTML-esh Style Tags (#2046) * tried implementing my own regex * improves tests * finally works * removes prints * fixed test * adds start and end * delete unused imports * refactored to use new tool * significantly improved algo * tag content -> tag attr * fix tests + adds new field * return full match * return remove start and end * update docstrings * update docstrings * update docstrings --------- Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Chi Wang <[email protected]> * Integrate AgentOptimizer (#1767) * draft agent optimizer * refactor * remove * change openai config interface * notebook * update blog * add test * clean up * redir * update * update interface * change model name * move to contrib * Update autogen/agentchat/contrib/agent_optimizer.py Co-authored-by: Jack Gerrits <[email protected]> --------- Co-authored-by: “skzhang1” <“[email protected]”> Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Jieyu Zhang <[email protected]> Co-authored-by: Jack Gerrits <[email protected]> * Introducing IOStream protocol and adding support for websockets (#1551) * Introducing IOStream * bug fixing * polishing * refactoring * refactoring * refactoring * wip: async tests * websockets added * wip * merge with main * notebook added * FastAPI example added * wip * merge * getter/setter to iostream added * website/blog/2024-03-03-AutoGen-Update/img/dalle_gpt4v.png: convert to Git LFS * website/blog/2024-03-03-AutoGen-Update/img/gaia.png: convert to Git LFS * website/blog/2024-03-03-AutoGen-Update/img/teach.png: convert to Git LFS * add SSL support * wip * wip * exception handling added to on_connect() * refactoring: default iostream is being set in a context manager * test fix * polishing * polishing * polishing * fixed bug with new thread * polishing * a bit of refactoring and docs added * notebook added to docs * type checking added to CI * CI fix * CI fix * CI fix * polishing * obsolete todo comment removed * fixed precommit error --------- Co-authored-by: Eric Zhu <[email protected]> * [CAP] [Feature] Get list of actors from directory service. (#2073) * Search directory for list of actors using regex '.*' gets all actors * docs changes * pre-commit fixes * Use ActorInfo from protobuf * pre-commit * Added zmq tests to work on removing sleeps * minor refactor of zmq tests * 1) Change DirSvr to user Broker. 2) Add req-router to broker 3) In ActorConnector use handshake and req/resp to remove sleep * 1) Change DirSvr to user Broker. 2) Add req-router to broker 3) In ActorConnector use handshake and req/resp to remove sleep * move socket creation to thread with recv * move socket creation to thread with recv * Better logging for DirectorySvc * better logging for directory svc * Use logging config * Start removing sleeps * pre-commit * Cleanup monitor socket * Mark cache as a protocol and update type hints to reflect (#2168) * Mark cache as a protocl and update type hints to reflect * int * undo init change modified: autogen/agentchat/chat.py * fix(): fix word spelling errors (#2171) * Implement User Defined Functions for Local CLI Executor (#2102) * Implement user defined functions feature for local cli exec, add docs * add tests, update docs * fixes * fix test * add pandas test dep * install test * provide template as func * formatting * undo change * address comments * add test deps * formatting * test only in 1 env * formatting * remove test for local only --------- Co-authored-by: Eric Zhu <[email protected]> * simplify getting-started; update news (#2175) * simplify getting-started; update news * bug fix * update (#2178) Co-authored-by: AnonymousRepoSub <“[email protected]” > * Fix formatting of admonitions in udf docs (#2188) * Fix iostream on new thread (#2181) * fixed get_stream in new thread by introducing a global default * fixed get_stream in new thread by introducing a global default --------- Co-authored-by: Chi Wang <[email protected]> * Add link for rendering notebooks docs on website (#2191) * Transform Messages Capability (#1923) * wip * Adds docstrings * fixed spellings * wip * fixed errors * better class names * adds tests * added tests to workflow * improved token counting * improved notebook * improved token counting in test * improved docstrings * fix inconsistencies * changed by mistake * fixed docstring * fixed details * improves tests + adds openai contrib test * fix spelling oai contrib test * clearer docstrings * remove repeated docstr * improved notebook * adds metadata to notebook * Improve outline and description (#2125) * better dir structure * clip max tokens to allowed tokens * more accurate comments/docstrs * add deperecation warning * fix front matter desc * add deperecation warning notebook * undo local notebook settings changes * format notebook * format workflow --------- Co-authored-by: gagb <[email protected]> * Bump express from 4.18.2 to 4.19.2 in /website (#2157) Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2. - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/master/History.md) - [Commits](expressjs/express@4.18.2...4.19.2) --- updated-dependencies: - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add clarity analytics (#2201) * Docstring formatting fix: Standardize docstrings to adhere to Google style guide, ensuring consistency and clarity. and also fixed the broken link for autogen/agentchat/chat.py * Docstring fix: Reformattted docstrings to adhere to Google style guide, nsuring consistency and clarity. For agentchat/contrib/retrieve_user_proxy_agent.py file * Fixed Pre-Commit Error, Trailing spaces on agentchat/chat.py * Fixed Pre-Commit Error, Trailing spaces on agentchat/chat.py --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Chi Wang <[email protected]> Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: olgavrou <[email protected]> Co-authored-by: gagb <[email protected]> Co-authored-by: Qingyun Wu <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Wael Karkoub <[email protected]> Co-authored-by: Shaokun Zhang <[email protected]> Co-authored-by: “skzhang1” <“[email protected]”> Co-authored-by: Jieyu Zhang <[email protected]> Co-authored-by: Jack Gerrits <[email protected]> Co-authored-by: Davor Runje <[email protected]> Co-authored-by: Rajan <[email protected]> Co-authored-by: calm <[email protected]> Co-authored-by: AnonymousRepoSub <“[email protected]” >
* Add vision capability * Configurate: description_prompt * Print warning instead of raising issues for type * Skip vision capability test if dependencies not installed * Append "vision" to agent's system message when enabled VisionCapability * GPT-4V notebook update with ConversableAgent * Clean GPT-4V notebook * Add vision capability test to workflow * Lint import * Update system message for vision capability * Add a `custom_caption_func` to VisionCapability * Add custom function example for vision capability * Skip test Vision capability custom func * GPT-4V notebook metadata to website * Remove redundant files * The custom caption function takes more inputs now * Add a more complex example of custom caption func * Remove trailing space --------- Co-authored-by: Chi Wang <[email protected]>
* DOC FIX - Formatted Docstrings for the retrieve_user_proxy_agent.py and Added first single line for the class RetrieveUserProxyAgent. * DOC FIX - Formatted Docstrings for theinitiate_chats functiion of ChatResult class in autogen/agentchat/chat.py * Add vision capability (microsoft#2025) * Add vision capability * Configurate: description_prompt * Print warning instead of raising issues for type * Skip vision capability test if dependencies not installed * Append "vision" to agent's system message when enabled VisionCapability * GPT-4V notebook update with ConversableAgent * Clean GPT-4V notebook * Add vision capability test to workflow * Lint import * Update system message for vision capability * Add a `custom_caption_func` to VisionCapability * Add custom function example for vision capability * Skip test Vision capability custom func * GPT-4V notebook metadata to website * Remove redundant files * The custom caption function takes more inputs now * Add a more complex example of custom caption func * Remove trailing space --------- Co-authored-by: Chi Wang <[email protected]> * Native tool call support for Mistral AI API and topic notebook. (microsoft#2135) * Support for Mistral AI API and topic notebook. * formatting * formatting * New conversational chess notebook using nested chats and tool use (microsoft#2137) * add chess notebook * update * update * Update notebook with figure * Add example link * redirect * Clean up example format * address gagan's comments * update references * fix links * add webarena in samples (microsoft#2114) * add webarena in samples/tools * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * Update samples/tools/webarena/README.md Co-authored-by: gagb <[email protected]> * update installation instructions * black formatting * Update README.md --------- Co-authored-by: gagb <[email protected]> Co-authored-by: Eric Zhu <[email protected]> * context to kwargs (microsoft#2064) * context to kwargs * add tag * add test * text to kwargs --------- Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: Chi Wang <[email protected]> * Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /website (microsoft#2131) Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.3 to 5.3.4. - [Release notes](https://github.com/webpack/webpack-dev-middleware/releases) - [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md) - [Commits](webpack/webpack-dev-middleware@v5.3.3...v5.3.4) --- updated-dependencies: - dependency-name: webpack-dev-middleware dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Eric Zhu <[email protected]> * Parse Any HTML-esh Style Tags (microsoft#2046) * tried implementing my own regex * improves tests * finally works * removes prints * fixed test * adds start and end * delete unused imports * refactored to use new tool * significantly improved algo * tag content -> tag attr * fix tests + adds new field * return full match * return remove start and end * update docstrings * update docstrings * update docstrings --------- Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Chi Wang <[email protected]> * Integrate AgentOptimizer (microsoft#1767) * draft agent optimizer * refactor * remove * change openai config interface * notebook * update blog * add test * clean up * redir * update * update interface * change model name * move to contrib * Update autogen/agentchat/contrib/agent_optimizer.py Co-authored-by: Jack Gerrits <[email protected]> --------- Co-authored-by: “skzhang1” <“[email protected]”> Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Jieyu Zhang <[email protected]> Co-authored-by: Jack Gerrits <[email protected]> * Introducing IOStream protocol and adding support for websockets (microsoft#1551) * Introducing IOStream * bug fixing * polishing * refactoring * refactoring * refactoring * wip: async tests * websockets added * wip * merge with main * notebook added * FastAPI example added * wip * merge * getter/setter to iostream added * website/blog/2024-03-03-AutoGen-Update/img/dalle_gpt4v.png: convert to Git LFS * website/blog/2024-03-03-AutoGen-Update/img/gaia.png: convert to Git LFS * website/blog/2024-03-03-AutoGen-Update/img/teach.png: convert to Git LFS * add SSL support * wip * wip * exception handling added to on_connect() * refactoring: default iostream is being set in a context manager * test fix * polishing * polishing * polishing * fixed bug with new thread * polishing * a bit of refactoring and docs added * notebook added to docs * type checking added to CI * CI fix * CI fix * CI fix * polishing * obsolete todo comment removed * fixed precommit error --------- Co-authored-by: Eric Zhu <[email protected]> * [CAP] [Feature] Get list of actors from directory service. (microsoft#2073) * Search directory for list of actors using regex '.*' gets all actors * docs changes * pre-commit fixes * Use ActorInfo from protobuf * pre-commit * Added zmq tests to work on removing sleeps * minor refactor of zmq tests * 1) Change DirSvr to user Broker. 2) Add req-router to broker 3) In ActorConnector use handshake and req/resp to remove sleep * 1) Change DirSvr to user Broker. 2) Add req-router to broker 3) In ActorConnector use handshake and req/resp to remove sleep * move socket creation to thread with recv * move socket creation to thread with recv * Better logging for DirectorySvc * better logging for directory svc * Use logging config * Start removing sleeps * pre-commit * Cleanup monitor socket * Mark cache as a protocol and update type hints to reflect (microsoft#2168) * Mark cache as a protocl and update type hints to reflect * int * undo init change modified: autogen/agentchat/chat.py * fix(): fix word spelling errors (microsoft#2171) * Implement User Defined Functions for Local CLI Executor (microsoft#2102) * Implement user defined functions feature for local cli exec, add docs * add tests, update docs * fixes * fix test * add pandas test dep * install test * provide template as func * formatting * undo change * address comments * add test deps * formatting * test only in 1 env * formatting * remove test for local only --------- Co-authored-by: Eric Zhu <[email protected]> * simplify getting-started; update news (microsoft#2175) * simplify getting-started; update news * bug fix * update (microsoft#2178) Co-authored-by: AnonymousRepoSub <“[email protected]” > * Fix formatting of admonitions in udf docs (microsoft#2188) * Fix iostream on new thread (microsoft#2181) * fixed get_stream in new thread by introducing a global default * fixed get_stream in new thread by introducing a global default --------- Co-authored-by: Chi Wang <[email protected]> * Add link for rendering notebooks docs on website (microsoft#2191) * Transform Messages Capability (microsoft#1923) * wip * Adds docstrings * fixed spellings * wip * fixed errors * better class names * adds tests * added tests to workflow * improved token counting * improved notebook * improved token counting in test * improved docstrings * fix inconsistencies * changed by mistake * fixed docstring * fixed details * improves tests + adds openai contrib test * fix spelling oai contrib test * clearer docstrings * remove repeated docstr * improved notebook * adds metadata to notebook * Improve outline and description (microsoft#2125) * better dir structure * clip max tokens to allowed tokens * more accurate comments/docstrs * add deperecation warning * fix front matter desc * add deperecation warning notebook * undo local notebook settings changes * format notebook * format workflow --------- Co-authored-by: gagb <[email protected]> * Bump express from 4.18.2 to 4.19.2 in /website (microsoft#2157) Bumps [express](https://github.com/expressjs/express) from 4.18.2 to 4.19.2. - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/master/History.md) - [Commits](expressjs/express@4.18.2...4.19.2) --- updated-dependencies: - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add clarity analytics (microsoft#2201) * Docstring formatting fix: Standardize docstrings to adhere to Google style guide, ensuring consistency and clarity. and also fixed the broken link for autogen/agentchat/chat.py * Docstring fix: Reformattted docstrings to adhere to Google style guide, nsuring consistency and clarity. For agentchat/contrib/retrieve_user_proxy_agent.py file * Fixed Pre-Commit Error, Trailing spaces on agentchat/chat.py * Fixed Pre-Commit Error, Trailing spaces on agentchat/chat.py --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Li Jiang <[email protected]> Co-authored-by: Beibin Li <[email protected]> Co-authored-by: Chi Wang <[email protected]> Co-authored-by: Eric Zhu <[email protected]> Co-authored-by: olgavrou <[email protected]> Co-authored-by: gagb <[email protected]> Co-authored-by: Qingyun Wu <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Wael Karkoub <[email protected]> Co-authored-by: Shaokun Zhang <[email protected]> Co-authored-by: “skzhang1” <“[email protected]”> Co-authored-by: Jieyu Zhang <[email protected]> Co-authored-by: Jack Gerrits <[email protected]> Co-authored-by: Davor Runje <[email protected]> Co-authored-by: Rajan <[email protected]> Co-authored-by: calm <[email protected]> Co-authored-by: AnonymousRepoSub <“[email protected]” >
We want to have a "vision capability" so that it can be added to conversable agents even if these agents are not connected to multimodal models.
See a feature overview in Issue #1975 . Closing #1926 and moving it here for testing purposes.
Why are these changes needed?
Related issue number
Checks