-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add sqlalchemy vectorizer helper #265
Conversation
9216152
to
517ce9f
Compare
8742af8
to
36cf4d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Left some comments to address
docs/python-integration.md
Outdated
return False | ||
return True | ||
|
||
context.configure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is context
coming from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context
is the alembic context used to configure and run migrations programmatically. Alembic autogenerates a fitting env.py for the user. E.g. this is how it looks if you're using async sqlalchemy: https://github.com/sqlalchemy/alembic/blob/main/alembic/templates/async/env.py
I can add the from alembic import context
if you'd like?
self._embedding_class = self.create_embedding_class(objtype) | ||
|
||
# Set up relationship if requested | ||
if self.add_relationship: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of configuring whether or not to add a relationship, why don't we always add it but allow the user to define the lazy
behavior and default that to select
? With select
nothing is ever pre-loaded right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other suggestion, what if we simply forward all **kwargs, *args
to the relationship function?
SQLAlchemy relationships are extremely configurable so I think just exposing anything the user might need while transparently adding the embedding model might be the cleanest interface.
This would basically make the Vectorizer
helper an embedding_relationship
function? We could even rename it as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add the relationship, does it mean that related data is prefetched with a join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's exactly what we are discussing right now. If you set lazy="select" it is only fetched as soon as it is accessed. if lazy is set to "joined" it's eagerly joined. There is also "selectin" for a separate query with an in statement, and a few other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about sqlalchemy to provide any useful feedback there.
e342f0b
to
c108a87
Compare
); | ||
``` | ||
|
||
We recommend adding this to a migration script and run it via alembic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add another reference to the Working with alembic section below.
docs/python-integration.md
Outdated
- `dimensions` (int): The size of the embedding vector (required) | ||
- `target_schema` (str, optional): Override the schema for the embeddings table. If not provided, inherits from the parent model's schema | ||
- `target_table` (str, optional): Override the table name for embeddings. Default is `{table_name}_embedding_store` | ||
- `add_relationship` (bool, optional): Whether to automatically create a relationship to the embeddings table (default: False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a reference to the ### 2. Relationship Access
section below. At this point I didn't know what are sqlalchemy relationships, and if I needed/wanted one or not.
3b47afc
to
8fe145e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments but nothing blocking merge. This looks super-useful.
docs/python-integration.md
Outdated
|
||
Additional parameters are simply forwarded to the underlying [SQLAlchemy relationship](https://docs.sqlalchemy.org/en/20/orm/relationships.html) so you can configure it as you desire. | ||
|
||
Think of the `embedding_relationship` as a normal SQLAlchemy relationship, but with a preconfigured model instance under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe something similar belongs in the intro?
Think of the `embedding_relationship` as a normal SQLAlchemy relationship, but with a preconfigured model instance under the hood. | |
Think of the `embedding_relationship` as a normal SQLAlchemy relationship, but with a preconfigured embedding model instance under the hood. |
8fe145e
to
882f91e
Compare
4dae567
to
22233a0
Compare
This PR adds a helper descriptorVectorizer
that creates an SQLAlchemy Model which can be used to reference vectorizer managed embedding tables without explicitly configuring it yourself.I've done another pass at the interface and now have renamed the
Vectorizer
toembedding_relationship
detailed in this discussion:#265 (comment)