How to pass images as input to CodeAgent? #298

DarshanDeshpande · 2025-01-21T17:14:27Z

Hello,

I want to pass an input image along with the prompt to CodeAgent.run. I see that there is an additional_args argument but when I pass the image as {"image": "path/to/image.png"}, the agent ends up loading the image via pytesseract to read the contents of the image instead of passing it to OpenAI/Anthropic directly. Is there any way that I can ensure that the image is passed along with the prompt so that the model can infer information from it instead of using external libraries to load the image when using the LiteLLM integration?

My code for reference:

agent = CodeAgent(
    tools=[],
    model=LiteLLMModel(
        model_id="openai/gpt-4o",
        api_key=os.environ.get('OPENAI_API_KEY'),
        temperature=1,
        top_p=0.95,
    ),
    add_base_tools=True,
    additional_authorized_imports=["sqlite3", "csv", "json", "os", "datetime", "requests", "pandas", "numpy", "sys"],
    max_steps=10,
)

agent.run(prompt, additional_args={"image": "path/to/image.png"})

The text was updated successfully, but these errors were encountered:

NSTiwari · 2025-01-21T18:50:36Z

@DarshanDeshpande
As far as I know, there's still no support for VLMs yet. An issue has been created for this already (to be added as a possible feature).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pass images as input to CodeAgent? #298

How to pass images as input to CodeAgent? #298

DarshanDeshpande commented Jan 21, 2025

NSTiwari commented Jan 21, 2025

How to pass images as input to CodeAgent? #298

How to pass images as input to CodeAgent? #298

Comments

DarshanDeshpande commented Jan 21, 2025

NSTiwari commented Jan 21, 2025