Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to pass images as input to CodeAgent? #298

Open
DarshanDeshpande opened this issue Jan 21, 2025 · 1 comment
Open

How to pass images as input to CodeAgent? #298

DarshanDeshpande opened this issue Jan 21, 2025 · 1 comment

Comments

@DarshanDeshpande
Copy link

Hello,

I want to pass an input image along with the prompt to CodeAgent.run. I see that there is an additional_args argument but when I pass the image as {"image": "path/to/image.png"}, the agent ends up loading the image via pytesseract to read the contents of the image instead of passing it to OpenAI/Anthropic directly. Is there any way that I can ensure that the image is passed along with the prompt so that the model can infer information from it instead of using external libraries to load the image when using the LiteLLM integration?

My code for reference:

agent = CodeAgent(
    tools=[],
    model=LiteLLMModel(
        model_id="openai/gpt-4o",
        api_key=os.environ.get('OPENAI_API_KEY'),
        temperature=1,
        top_p=0.95,
    ),
    add_base_tools=True,
    additional_authorized_imports=["sqlite3", "csv", "json", "os", "datetime", "requests", "pandas", "numpy", "sys"],
    max_steps=10,
)

agent.run(prompt, additional_args={"image": "path/to/image.png"})
@NSTiwari
Copy link

@DarshanDeshpande
As far as I know, there's still no support for VLMs yet. An issue has been created for this already (to be added as a possible feature).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants