Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Request call Bedrock API or increased Ollama compute f #6601

Open
tpoconnor-14 opened this issue Jan 24, 2025 · 4 comments
Open

✨ Request call Bedrock API or increased Ollama compute f #6601

tpoconnor-14 opened this issue Jan 24, 2025 · 4 comments

Comments

@tpoconnor-14
Copy link

Describe the feature request.

I am working on a piece of critical analysis which requires making around 350'000 queries (roughly input tokens with a prompt) to an LLM, on either Ollama or Bedrock.

Input words: 148946466 x 2 x 1.3 (number of words x 2 for each prompt per query) * 1.3 to convert to tokens
Output words: 148946466 words * 1.3 (overestimate).

The pricing for Bedrock is below:

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens Price per 1,000 input tokens (batch) Price per 1,000 output tokens (batch)
Claude 3 Sonnet $0.003 $0.015 $0.0015 $0.0075
Claude 3 Haiku $0.00025 $0.00125 $0.000125 $0.000625

Claude Sonnet total cost: (((148946466 * 1.3) / 1000) * (0.003)) + (((1489464661.3) / 1000) * (0.015))= $3485
Claude Haiku: ((((148946466 * 2) * 1.3) / 1000) * (0.00025)) + (((148946466
1.3) / 1000) * (0.00125)) = $338.85

Describe the context.

No response

Value / Purpose

This analysis is critical for the delivery of a report.

User Types

No response

@simon-pope
Copy link

simon-pope commented Jan 27, 2025

@jhpyke @julialawrence
2025-01-27 15:11:25,875 - ERROR - Individual processing failed for text 581: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

@simon-pope
Copy link

2025-01-27 14:58:31,362 - ERROR - Error raised by bedrock service: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.

@jhpyke
Copy link
Contributor

jhpyke commented Jan 27, 2025

Actions:
Have updated basic limits on Claude Haiku models. AWS has some limits we are unable to alter, namely:
On-demand InvokeModel requests per minute for Anthropic Claude 3.5 Haiku which is limited to a maximum of 400 requests per minute, with no option to adjust from AWS. As such, it may be worth exploring whether cross-region inference is appropriate here, as that has a limit of 2000 requests per minute. This does come with the limitation that your data would exist potentially anywhere in eu-west rather than just eu-west-1 (Ireland) but effectively is a 5x increase in maximum requests.

Further to this, the AP team will be investigating any bottlenecks in the existing bedrock setup to understand if these limits being hit are points we have control over or AWS mandated.

@tpoconnor-14
Copy link
Author

Hello Jake, thank you for letting me know. A 5x increase would be great, however our the DPIA we have in place for this project restricts processing to Ireland or London.

Thanks for following up with the AP team.

@simon-pope simon-pope moved this from 👀 TODO to 🚀 In Progress in Analytical Platform Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🚀 In Progress
Development

No branches or pull requests

3 participants