Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multilingual bedrock #486

Merged
merged 3 commits into from
Dec 13, 2024
Merged

feat: multilingual bedrock #486

merged 3 commits into from
Dec 13, 2024

Conversation

jamescalam
Copy link
Member

@jamescalam jamescalam commented Dec 13, 2024

User description

testing #478


PR Type

Enhancement


Description

  • Added support for multimodal embeddings (TEXT, IMAGE, or both) in the BedrockEncoder class.
  • Introduced a new model_kwargs parameter to allow model-specific inference parameters.
  • Enhanced payload construction to handle multimodal data, including base64-encoded images.
  • Updated logic for handling Amazon and Cohere models, providing additional flexibility for inference parameters.
  • Improved code robustness by cleaning up null values in payloads.

Changes walkthrough 📝

Relevant files
Enhancement
bedrock.py
Support multimodal inputs and model-specific parameters in
BedrockEncoder

semantic_router/encoders/bedrock.py

  • Added support for multimodal inputs (TEXT, IMAGE, or both) in the
    BedrockEncoder class.
  • Introduced model_kwargs parameter for model-specific inference
    parameters.
  • Enhanced payload construction to handle multimodal data and clean up
    null values.
  • Updated logic for handling Amazon and Cohere models with additional
    flexibility for inference parameters.
  • +37/-11 

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🎫 Ticket compliance analysis ✅

    478 - Fully compliant

    Fully compliant requirements:

    • Add support for multimodal embeddings (TEXT, IMAGE, or both) via Amazon Titan for Multimodal Embeddings.
    • Introduce a model_kwargs parameter for model-specific inference parameters.
    • Ensure Titan MM can handle base64-encoded images.
    • Enhance payload construction to support multimodal data.
    • Update logic for Amazon and Cohere models to provide flexibility for inference parameters.
    • Improve code robustness by cleaning up null values in payloads.

    Not compliant requirements:
    None

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Smell
    The use of the | operator for merging dictionaries (e.g., embedding_body = embedding_body | model_kwargs) is Python 3.9+ specific. Ensure compatibility with the project's minimum Python version.

    Code Smell
    The logic for handling Amazon and Cohere models is duplicated in parts. Consider refactoring to reduce redundancy and improve maintainability.

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Safeguard against runtime errors by validating model_kwargs before merging

    Handle potential exceptions when merging model_kwargs with embedding_body or chunk
    to avoid runtime errors if model_kwargs is not a dictionary.

    semantic_router/encoders/bedrock.py [186]

    -embedding_body = embedding_body | model_kwargs
    +if isinstance(model_kwargs, dict):
    +    embedding_body = embedding_body | model_kwargs
    Suggestion importance[1-10]: 9

    Why: The suggestion prevents runtime errors by ensuring model_kwargs is a dictionary before merging it with embedding_body. This is a critical safeguard for the code's stability and correctness.

    9
    Validate input text to prevent unexpected or invalid values from causing errors

    Ensure that embedding_body["inputText"] is properly validated to avoid potential
    issues when doc.get("text") returns None or an unexpected type.

    semantic_router/encoders/bedrock.py [178]

    -embedding_body["inputText"] = doc.get("text")
    +embedding_body["inputText"] = doc.get("text") if isinstance(doc.get("text"), str) else ""
    Suggestion importance[1-10]: 8

    Why: The suggestion ensures that embedding_body["inputText"] is assigned a valid string, preventing potential issues when doc.get("text") returns None or an invalid type. This is a critical improvement for robustness.

    8
    Prevent attribute errors by validating the response body before processing

    Ensure that response.get("body") is not None and has a valid read method to avoid
    attribute errors during response processing.

    semantic_router/encoders/bedrock.py [201]

    -response_body = json.loads(response.get("body").read())
    +response_body = json.loads(response.get("body").read()) if response.get("body") and hasattr(response.get("body"), "read") else {}
    Suggestion importance[1-10]: 8

    Why: The suggestion adds a validation step to ensure response.get("body") is not None and has a read method, preventing potential attribute errors during response processing. This is a significant improvement for error handling.

    8
    Validate the input image to ensure it is a valid base64-encoded string

    Add a check to ensure that embedding_body["inputImage"] is a valid base64-encoded
    string before including it in the payload.

    semantic_router/encoders/bedrock.py [179]

    -embedding_body["inputImage"] = doc.get("image")
    +embedding_body["inputImage"] = doc.get("image") if isinstance(doc.get("image"), str) and is_base64(doc.get("image")) else None
    Suggestion importance[1-10]: 7

    Why: Adding validation for embedding_body["inputImage"] ensures that only valid base64-encoded strings are included, reducing the risk of runtime errors or invalid payloads. This is a useful enhancement for data integrity.

    7

    @jamescalam jamescalam merged commit 2c09ff7 into main Dec 13, 2024
    7 checks passed
    @jamescalam jamescalam deleted the pr/478 branch December 13, 2024 13:57
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants