✨ Request call Bedrock API or increased Ollama compute f #6601

tpoconnor-14 · 2025-01-24T15:34:08Z

Describe the feature request.

I am working on a piece of critical analysis which requires making around 350'000 queries (roughly input tokens with a prompt) to an LLM, on either Ollama or Bedrock.

Input words: 148946466 x 2 x 1.3 (number of words x 2 for each prompt per query) * 1.3 to convert to tokens
Output words: 148946466 words * 1.3 (overestimate).

The pricing for Bedrock is below:

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)
Claude 3 Sonnet	$0.003	$0.015	$0.0015	$0.0075
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625

Claude Sonnet total cost: (((148946466 * 1.3) / 1000) * (0.003)) + (((1489464661.3) / 1000) * (0.015))= $3485
Claude Haiku: ((((148946466 * 2) * 1.3) / 1000) * (0.00025)) + (((1489464661.3) / 1000) * (0.00125)) = $338.85

Describe the context.

No response

Value / Purpose

This analysis is critical for the delivery of a report.

User Types

No response

simon-pope · 2025-01-27T15:13:27Z

@jhpyke @julialawrence
2025-01-27 15:11:25,875 - ERROR - Individual processing failed for text 581: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

simon-pope · 2025-01-27T15:14:49Z

2025-01-27 14:58:31,362 - ERROR - Error raised by bedrock service: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

jhpyke · 2025-01-27T15:30:27Z

Actions:
Have updated basic limits on Claude Haiku models. AWS has some limits we are unable to alter, namely:
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Haiku which is limited to a maximum of 400 requests per minute, with no option to adjust from AWS. As such, it may be worth exploring whether cross-region inference is appropriate here, as that has a limit of 2000 requests per minute. This does come with the limitation that your data would exist potentially anywhere in eu-west rather than just eu-west-1 (Ireland) but effectively is a 5x increase in maximum requests.

Further to this, the AP team will be investigating any bottlenecks in the existing bedrock setup to understand if these limits being hit are points we have control over or AWS mandated.

tpoconnor-14 · 2025-01-27T15:33:45Z

Hello Jake, thank you for letting me know. A 5x increase would be great, however our the DPIA we have in place for this project restricts processing to Ireland or London.

Thanks for following up with the AP team.

tpoconnor-14 added the feature-request label Jan 24, 2025

github-project-automation bot added this to Analytical Platform Jan 24, 2025

github-project-automation bot moved this to 👀 TODO in Analytical Platform Jan 24, 2025

simon-pope moved this from 👀 TODO to 🚀 In Progress in Analytical Platform Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Request call Bedrock API or increased Ollama compute f #6601

✨ Request call Bedrock API or increased Ollama compute f #6601

tpoconnor-14 commented Jan 24, 2025

simon-pope commented Jan 27, 2025 •

edited

Loading

simon-pope commented Jan 27, 2025

jhpyke commented Jan 27, 2025

tpoconnor-14 commented Jan 27, 2025

✨ Request call Bedrock API or increased Ollama compute f #6601

✨ Request call Bedrock API or increased Ollama compute f #6601

Comments

tpoconnor-14 commented Jan 24, 2025

Describe the feature request.

Describe the context.

Value / Purpose

User Types

simon-pope commented Jan 27, 2025 • edited Loading

simon-pope commented Jan 27, 2025

jhpyke commented Jan 27, 2025

tpoconnor-14 commented Jan 27, 2025

simon-pope commented Jan 27, 2025 •

edited

Loading