-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add alembic operations for vectorizer #266
Conversation
c899380
to
fd9f1bc
Compare
fd9f1bc
to
6f5ff59
Compare
8742af8
to
36cf4d9
Compare
6f5ff59
to
525ab5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gotta say I'm not convinced about the arguments for using a separate model than the models already in pgai/vectorizer or at least having both sets of models extend a base model. I think having 2 sets of models with similar params is really hard to maintain and quite a bit of code duplication. I'd like some more eyes on this tho. Can
James and/or Alejandro chime in here. In particular I'd like us to consider three designs:
- simply extending the pydantic model we already have with optional fields that are present in either the stored json OR needed for the alembic stuff + having some kind of wrappers to create the config objects in alembic.
- Factoring common data fields into base classes and using those as mixins. (kinda like the ApiKeyMixin now).
- Maybe I'm just being stubborn and we should have separate models, like Jascha has them now.
leaving a few comments in but I think this is the big issue we need to resolve
3b47afc
to
8fe145e
Compare
525ab5b
to
5e76cf9
Compare
projects/pgai/tests/vectorizer/extensions/fixtures/migrations/002_create_vectorizer.py.template
Outdated
Show resolved
Hide resolved
8fe145e
to
882f91e
Compare
c90ae69
to
7b90575
Compare
2ea1180
to
786ecfc
Compare
786ecfc
to
9e98564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Two comments that need addressing. Would like for @JamesGuthrie to review next and then we can merge
@@ -1,3 +1,38 @@ | |||
# Creating vectorizers from python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to add this to the docs, I'd like at least some tests of using this functionality outside of Alembic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few minor bits and pieces.
n.nspname, | ||
p.proargnames, | ||
p.pronargdefaults, | ||
string_to_array(array_to_string(p.proargtypes, ' '), ' ') as argtypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this amusing. I think that you could replace it with p.proargtypes::oid[]::text[]
, but I'm not 100% sure on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually didn't understand this code in detail either, it's something that claude came up with but works so I didn't worry too much about it.
type_name = type_info[1] # type: ignore | ||
is_array = type_info[2] # type: ignore | ||
|
||
default = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bit misleading thanks for questioning it. I removed the default value from the code base now.
The generated classes don't actually have the correct default values the to_sql function just laves any None value out so that the ai.xyz() call uses the sqls default value. So the default
is always None in this script (otherwise I'd have to correctly parse the sql default values into its python representation, which I am saving on this way.
Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Jascha Beste <[email protected]>
Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Jascha Beste <[email protected]>
This PR adds native python operations to alembic so you don't have to write SQL to create vectorizers.