feat: add alembic operations for vectorizer #266

Askir · 2024-12-02T10:03:17Z

This PR adds native python operations to alembic so you don't have to write SQL to create vectorizers.

cevian

I gotta say I'm not convinced about the arguments for using a separate model than the models already in pgai/vectorizer or at least having both sets of models extend a base model. I think having 2 sets of models with similar params is really hard to maintain and quite a bit of code duplication. I'd like some more eyes on this tho. Can
James and/or Alejandro chime in here. In particular I'd like us to consider three designs:

simply extending the pydantic model we already have with optional fields that are present in either the stored json OR needed for the alembic stuff + having some kind of wrappers to create the config objects in alembic.
Factoring common data fields into base classes and using those as mixins. (kinda like the ApiKeyMixin now).
Maybe I'm just being stubborn and we should have separate models, like Jascha has them now.
leaving a few comments in but I think this is the big issue we need to resolve

projects/pgai/pgai/configuration.py

projects/pgai/pgai/alembic/operations.py

projects/pgai/pgai/configuration.py

docs/python-integration.md

projects/pgai/pgai/alembic/operations.py

projects/pgai/tests/vectorizer/extensions/fixtures/migrations/002_create_vectorizer.py.template

cevian

LGTM. Two comments that need addressing. Would like for @JamesGuthrie to review next and then we can merge

docs/adding-embedding-integration.md

cevian · 2025-01-21T19:08:38Z

docs/python-integration.md

@@ -1,3 +1,38 @@
+# Creating vectorizers from python


If we are going to add this to the docs, I'd like at least some tests of using this functionality outside of Alembic

JamesGuthrie

LGTM, a few minor bits and pieces.

docs/adding-embedding-integration.md

docs/python-integration.md

projects/pgai/pgai/vectorizer/generate/README.md

projects/pgai/pgai/vectorizer/generate/generate.py

JamesGuthrie · 2025-01-22T10:58:04Z

projects/pgai/pgai/vectorizer/generate/function_parser.py

+            n.nspname,
+            p.proargnames,
+            p.pronargdefaults,
+            string_to_array(array_to_string(p.proargtypes, ' '), ' ') as argtypes,


I found this amusing. I think that you could replace it with p.proargtypes::oid[]::text[], but I'm not 100% sure on that.

I actually didn't understand this code in detail either, it's something that claude came up with but works so I didn't worry too much about it.

JamesGuthrie · 2025-01-22T11:01:59Z

projects/pgai/pgai/vectorizer/generate/function_parser.py

+                type_name = type_info[1]  # type: ignore
+                is_array = type_info[2]  # type: ignore
+
+                default = None


Is this correct?

This is actually a bit misleading thanks for questioning it. I removed the default value from the code base now.

The generated classes don't actually have the correct default values the to_sql function just laves any None value out so that the ai.xyz() call uses the sqls default value. So the default is always None in this script (otherwise I'd have to correctly parse the sql default values into its python representation, which I am saving on this way.

projects/pgai/pgai/vectorizer/generate/function_parser.py

Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Jascha Beste <[email protected]>

Askir force-pushed the jascha/add-alembic-migration-ops branch from c899380 to fd9f1bc Compare December 2, 2024 10:08

Askir mentioned this pull request Dec 2, 2024

feat: SQLAlchemy and alembic integration #208

Closed

Askir force-pushed the jascha/add-alembic-migration-ops branch from fd9f1bc to 6f5ff59 Compare December 3, 2024 13:37

Askir marked this pull request as ready for review December 3, 2024 23:16

Askir requested a review from a team as a code owner December 3, 2024 23:16

Askir force-pushed the jascha/add-vectorizer-field branch from 8742af8 to 36cf4d9 Compare December 4, 2024 13:13

Askir force-pushed the jascha/add-alembic-migration-ops branch from 6f5ff59 to 525ab5b Compare December 4, 2024 13:20

cevian requested changes Dec 4, 2024

View reviewed changes

projects/pgai/pgai/configuration.py Outdated Show resolved Hide resolved

projects/pgai/pgai/alembic/operations.py Show resolved Hide resolved

projects/pgai/pgai/alembic/operations.py Outdated Show resolved Hide resolved

projects/pgai/pgai/configuration.py Outdated Show resolved Hide resolved

JamesGuthrie reviewed Dec 5, 2024

View reviewed changes

docs/python-integration.md Outdated Show resolved Hide resolved

Askir force-pushed the jascha/add-vectorizer-field branch 10 times, most recently from 3b47afc to 8fe145e Compare December 12, 2024 13:46

Askir force-pushed the jascha/add-alembic-migration-ops branch from 525ab5b to 5e76cf9 Compare December 12, 2024 16:44

Askir commented Dec 13, 2024

View reviewed changes

projects/pgai/pgai/alembic/operations.py Outdated Show resolved Hide resolved

projects/pgai/pgai/alembic/operations.py Show resolved Hide resolved

projects/pgai/tests/vectorizer/extensions/fixtures/migrations/002_create_vectorizer.py.template Outdated Show resolved Hide resolved

Askir force-pushed the jascha/add-vectorizer-field branch from 8fe145e to 882f91e Compare December 19, 2024 11:40

Base automatically changed from jascha/add-vectorizer-field to main December 19, 2024 12:32

Askir force-pushed the jascha/add-alembic-migration-ops branch 7 times, most recently from c90ae69 to 7b90575 Compare January 7, 2025 13:57

Askir force-pushed the jascha/add-alembic-migration-ops branch 2 times, most recently from 2ea1180 to 786ecfc Compare January 16, 2025 13:39

Askir requested review from JamesGuthrie and cevian January 16, 2025 13:53

Askir added 14 commits January 17, 2025 12:58

feat: add alembic operations for vectorizer

a75c9ba

chore: cleanup set up of operations

9ab36ae

chore: add shared base class

a9ed8f1

docs: update docs

ee4ffd3

chore: unify sql generation

28dcedd

chore: add more test cases

d3760d8

chore: simplify code and tests a bit

fc9b089

chore: use shared base classes, make use of more optional params

4fc64ba

chore: revert dockerfile change

1f0a107

chore: move configuration to alembic package

755127a

feat: add code generation for migration dataclasses

a08865c

chore: downgrade voyageai for tests

eaee43d

chore: rename table_name to target_table_name

fe7975b

feat: expose CreateVectorizer directly and add docs for it

9e98564

Askir force-pushed the jascha/add-alembic-migration-ops branch from 786ecfc to 9e98564 Compare January 17, 2025 11:59

cevian approved these changes Jan 21, 2025

View reviewed changes

JamesGuthrie approved these changes Jan 22, 2025

View reviewed changes

Askir and others added 6 commits January 22, 2025 04:26

chore: update docs/python-integration.md

99e207a

Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Jascha Beste <[email protected]>

chore: update projects/pgai/pgai/vectorizer/generate/README.md

bfbcfb4

Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Jascha Beste <[email protected]>

chore: fix link to code gen

e1bbcea

chore: remove default_value from code generation

cd2f492

chore: add some tests for vectorizer creation from python

73d9ef9

chore: upgrade uv to 0.5.20

d13aa82

Askir merged commit b01acfe into main Jan 22, 2025
5 checks passed

Askir deleted the jascha/add-alembic-migration-ops branch January 22, 2025 15:56

github-actions bot mentioned this pull request Jan 22, 2025

chore(main): release pgai 0.5.0 #355

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add alembic operations for vectorizer #266

feat: add alembic operations for vectorizer #266

Askir commented Dec 2, 2024 •

edited

Loading

cevian left a comment

cevian left a comment

cevian Jan 21, 2025

JamesGuthrie left a comment

JamesGuthrie Jan 22, 2025

Askir Jan 22, 2025

JamesGuthrie Jan 22, 2025

Askir Jan 22, 2025

feat: add alembic operations for vectorizer #266

feat: add alembic operations for vectorizer #266

Conversation

Askir commented Dec 2, 2024 • edited Loading

cevian left a comment

Choose a reason for hiding this comment

cevian left a comment

Choose a reason for hiding this comment

cevian Jan 21, 2025

Choose a reason for hiding this comment

JamesGuthrie left a comment

Choose a reason for hiding this comment

JamesGuthrie Jan 22, 2025

Choose a reason for hiding this comment

Askir Jan 22, 2025

Choose a reason for hiding this comment

JamesGuthrie Jan 22, 2025

Choose a reason for hiding this comment

Askir Jan 22, 2025

Choose a reason for hiding this comment

Askir commented Dec 2, 2024 •

edited

Loading