Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Bedrock Multimodal Embeddings 💬🖼️ #478

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

JGalego
Copy link
Contributor

@JGalego JGalego commented Dec 4, 2024

Adds support for multimodal embeddings (TEXT, IMAGE or both) via Amazon Titan for Multimodal Embeddings (Titan MM) and model-specific inference parameters via model_kwargs. Notice that Titan MM expects a base64-encoded image.

Example:

"""
Semantic Router powered by Amazon Titan Multimodal Embeddings
https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-mm.html
"""

# Standard imports
import base64

# Library imports
from datasets import load_dataset
from semantic_router.encoders import BedrockEncoder

# Load NFSW/SFW dataset and preprocess images
# https://huggingface.co/datasets/aurelio-ai/shrek-detection
data = load_dataset("aurelio-ai/shrek-detection", split="train", trust_remote_code=True)
data = data.to_list()

def process_image(example):
    """Turns images into base64-encoded strings."""
    im_bytes = example['image']['bytes']
    example['image'] = base64.b64encode(im_bytes).decode('utf8')
    return example

data = [process_image(example) for example in data]

# Initialize encoder
encoder = BedrockEncoder("amazon.titan-embed-image-v1", region="us-east-1")

# TEXT + IMAGE
embeddings = encoder(data[:5])

# TEXT only
embeddings = encoder([example['text'] for example in data[:5]])

# IMAGE only
embeddings = encoder([{'image': example['image']} for example in data[:5]])

# Change embedding dimension
embeddings = encoder(data[:5], model_kwargs={'embeddingConfig': {'outputEmbeddingLength': 256}})

@JGalego
Copy link
Contributor Author

JGalego commented Dec 13, 2024

@jamescalam pinging you as top contributor for a review

@jamescalam jamescalam self-requested a review December 13, 2024 12:32
@jamescalam jamescalam added the feature New feature request label Dec 13, 2024
@jamescalam jamescalam merged commit e105fe5 into aurelio-labs:main Dec 13, 2024
@jamescalam
Copy link
Member

all good - thanks for the PR @JGalego!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants