docs: update docs with review comments

timescale · Dec 7, 2024 · c108a87 · c108a87
1 parent e8307cd
commit c108a87
Show file tree

Hide file tree

Showing 2 changed files with 417 additions and 53 deletions.
diff --git a/docs/python-integration.md b/docs/python-integration.md
@@ -1,22 +1,22 @@
-# SQLAlchemy Integration with pgAI Vectorizer
+# SQLAlchemy Integration with pgai Vectorizer
 
-The `VectorizerField` is a SQLAlchemy helper type that integrates pgAI's vectorization capabilities directly into your SQLAlchemy models. This allows you to easily query vector embeddings created by pgai using familiar SQLAlchemy patterns.
+The `Vectorizer` is a SQLAlchemy helper type that integrates pgai's vectorization capabilities directly into your SQLAlchemy models. This allows you to easily query vector embeddings created by pgai using familiar SQLAlchemy patterns.
 
 ## Installation
 
-To use the SQLAlchemy integration, install pgAI with the SQLAlchemy extras:
+To use the SQLAlchemy integration, install pgai with the SQLAlchemy extras:
 
 ```bash
 pip install "pgai[sqlalchemy]"
 ```
 
 ## Basic Usage
 
-Here's a basic example of how to use the `VectorizerField`:
+Here's a basic example of how to use the `Vectorizer`:
 
 ```python
 from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
-from pgai.sqlalchemy import VectorizerField, EmbeddingModel
+from pgai.sqlalchemy import Vectorizer, EmbeddingModel
 
 class Base(DeclarativeBase):
     pass
@@ -29,46 +29,66 @@ class BlogPost(Base):
     content: Mapped[str]
 
     # Add vector embeddings for the content field
-    content_embeddings = VectorizerField(
+    content_embeddings = Vectorizer(
         dimensions=768,
         add_relationship=True,
     )
 
     # Optional: Type hint for the relationship
     content_embeddings_relation: Mapped[list[EmbeddingModel["BlogPost"]]]
 ```
+Note if you work with alembics autogenerate functionality for migrations, also check [Working with alembic](#working-with-alembic).
 
-## Configuration
+### Semantic Search
 
-The `VectorizerField` accepts the following parameters:
+You can then perform semantic similarity search on the field using [pgvector-python's](https://github.com/pgvector/pgvector-python) distance functions:
 
-- `dimensions` (int): The size of the embedding vector (required)
-- `target_schema` (str, optional): Override the schema for the embeddings table. If not provided, inherits from the parent model's schema
-- `target_table` (str, optional): Override the table name for embeddings. Default is `{table_name}_{field_name}_store`
-- `add_relationship` (bool): Whether to automatically create a relationship to the embeddings table (default: False)
+```python
+from sqlalchemy import func, text
 
-**Note:** The `VectorizerField` generates a new SQLAlchemy model, that is available under the attribute that you specify. If you are using alembics autogenerate functionality to generate migrations, you may need to exclude these models from the autogenerate process.
-They are tagged with `pgai_managed=True`so you can simply exclude them by adding the following to your `env.py`:
+similar_posts = (
+    session.query(BlogPost.content_embeddings)
+    .order_by(
+        BlogPost.content_embeddings.embedding.cosine_distance(
+            func.ai.openai_embed(
+                "text-embedding-3-small",
+                "search query",
+                text("dimensions => 768")
+            )
+        )
+    )
+    .limit(5)
+    .all()
+)
+```
 
-```python
-def include_object(object, name, type_, reflected, compare_to):
-    if type_ == "table" and name in target_metadata.info.get("pgai_managed_tables", set()):
-        return False
-    return True
+Or if you already have the embeddings in your application:
 
-context.configure(
-      connection=connection,
-      target_metadata=target_metadata,
-      include_object=include_object
-  )
+```python
+similar_posts = (
+    session.query(BlogPost.content_embeddings)
+    .order_by(
+        BlogPost.content_embeddings.embedding.cosine_distance(
+            [3, 1, 2]
+        )
+    )
+    .limit(5)
+    .all()
+)
 ```
 
-The model is only created at runtime, so depending on how your alembic migrations are set up this step could be skipped. Simply see what happens if you run `alembic revision --autogenerate` and if the model is included, add the above code.
+## Configuration
 
+The `Vectorizer` accepts the following parameters:
+
+- `dimensions` (int): The size of the embedding vector (required)
+- `target_schema` (str, optional): Override the schema for the embeddings table. If not provided, inherits from the parent model's schema
+- `target_table` (str, optional): Override the table name for embeddings. Default is `{table_name}_{field_name}_store`
+- `add_relationship` (bool, optional): Whether to automatically create a relationship to the embeddings table (default: False)
 
 ## Setting up the Vectorizer
 
-After defining your model, you need to create the vectorizer using pgAI's SQL functions:
+After defining your model, you need to create the vectorizer using pgai's SQL functions:
 
 ```sql
 SELECT ai.create_vectorizer(
@@ -85,9 +105,10 @@ SELECT ai.create_vectorizer(
 
 We recommend adding this to a migration script and run it via alembic.
 
+
 ## Querying Embeddings
 
-The `VectorizerField` provides several ways to work with embeddings:
+The `Vectorizer` provides several ways to work with embeddings:
 
 ### 1. Direct Access to Embeddings
 
@@ -110,35 +131,13 @@ blog_post = session.query(BlogPost).first()
 for embedding in blog_post.content_embeddings_relation:  # Note: uses _relation suffix
     print(embedding.chunk)
 ```
-
-### 3. Semantic Search
-
-You can perform semantic similarity searches using [pgvector-pythons](https://github.com/pgvector/pgvector-python) distance functions:
-
+Access the original posts through the parent relationship
 ```python
-from sqlalchemy import func, text
-
-similar_posts = (
-    session.query(BlogPost.content_embeddings)
-    .order_by(
-        BlogPost.content_embeddings.embedding.cosine_distance(
-            func.ai.openai_embed(
-                "text-embedding-3-small",
-                "search query",
-                text("dimensions => 768")
-            )
-        )
-    )
-    .limit(5)
-    .all()
-)
-
-# Access the original posts through the parent relationship
 for embedding in similar_posts:
     print(embedding.parent.title)
 ```
 
-### 4. Join Queries
+### 3. Join Queries
 
 You can combine embedding queries with regular SQL queries using the relationship:
 
@@ -153,4 +152,25 @@ results = (
 for post, embedding in results:
     print(f"Title: {post.title}")
     print(f"Chunk: {embedding.chunk}")
-```
+```
+
+## Working with alembic 
+
+
+The `Vectorizer` generates a new SQLAlchemy model, that is available under the attribute that you specify. If you are using alembic's autogenerate functionality to generate migrations, you will need to exclude these models from the autogenerate process.
+These are added to a list in your metadata called `pgai_managed_tables` and you can exclude them by adding the following to your `env.py`:
+
+```python
+def include_object(object, name, type_, reflected, compare_to):
+    if type_ == "table" and name in target_metadata.info.get("pgai_managed_tables", set()):
+        return False
+    return True
+
+context.configure(
+      connection=connection,
+      target_metadata=target_metadata,
+      include_object=include_object
+  )
+```
+
+This should now prevent alembic from generating tables for these models when you run `alembic revision --autogenerate`.