From 1d79c68e155f36e55dea63dec850b6bf8b4a7968 Mon Sep 17 00:00:00 2001 From: IrisWan <150207222+WanYixian@users.noreply.github.com> Date: Wed, 4 Dec 2024 15:37:56 +0800 Subject: [PATCH] Document ingest data from PostgreSQL table (#108) * add doc * minor update --- changelog/product-lifecycle.mdx | 1 + ingestion/overview.mdx | 6 +++ integrations/sources/postgresql-table.mdx | 57 +++++++++++++++++++++++ mint.json | 1 + 4 files changed, 65 insertions(+) create mode 100644 integrations/sources/postgresql-table.mdx diff --git a/changelog/product-lifecycle.mdx b/changelog/product-lifecycle.mdx index 7571ba91..892adb52 100644 --- a/changelog/product-lifecycle.mdx +++ b/changelog/product-lifecycle.mdx @@ -22,6 +22,7 @@ Below is a list of all features in the public preview phase: | Feature name | Start version | | :-- | :-- | +| [Ingest data from Postgres table](/integrations/sources/postgresql-table) | 2.1 | | [Ingest data from webhook](/integrations/sources/webhook) | 2.1 | | [Shared source](/sql/commands/sql-create-source#shared-source) | 2.1 | | [ASOF join](/processing/sql/joins#asof-joins) | 2.1 | diff --git a/ingestion/overview.mdx b/ingestion/overview.mdx index ae3939c2..291457f7 100644 --- a/ingestion/overview.mdx +++ b/ingestion/overview.mdx @@ -74,6 +74,12 @@ The statement will create a streaming job that continuously ingests data from th 4. **Stronger consistency guarantee**: When using a table with connectors, all downstream jobs will be guaranteed to have a consistent view of the data persisted in the table; while for source, different jobs may see inconsistent results due to different ingestion speed or data retention in the external system. 5. **Greater flexibility**: Like regular tables, you can use DML statements like [INSERT](/sql/commands/sql-insert), [UPDATE](/sql/commands/sql-update) and [DELETE](/sql/commands/sql-delete) to insert or modify data in tables with connectors, and use [CREATE SINK INTO TABLE](/sql/commands/sql-create-sink-into) to merge other data streams into the table. +### PostgreSQL table + +RisingWave supports using the table-valued function `postgres_query` to directly query PostgreSQL databases. This function connects to a specified PostgreSQL instance, executes the provided SQL query, and returns the results as a table in RisingWave. + +To use it, specify connection details (such as hostname, port, username, password, database name) and the desired SQL query. This makes it easier to integrate PostgreSQL data directly into RisingWave workflows without needing additional data transfer steps. For more information, see [Ingest data from Postgres tables](/integrations/sources/postgresql-table). + ## DML on tables ### Insert data into tables diff --git a/integrations/sources/postgresql-table.mdx b/integrations/sources/postgresql-table.mdx new file mode 100644 index 00000000..07057cf0 --- /dev/null +++ b/integrations/sources/postgresql-table.mdx @@ -0,0 +1,57 @@ +--- +title: "Ingest data from PostgreSQL table" +description: "Describes how to ingest data from PostgreSQL table to RisingWave using table-valued function." +sidebarTitle: PostgreSQL table +--- + +RisingWave allows you to query PostgreSQL tables directly with the `postgres_query` table-valued function (TVF). It offers a simpler alternative to Change Data Capture (CDC) when working with PostgreSQL data in RisingWave. + +Unlike CDC, which continuously syncs data changes, this function lets you fetch data directly from PostgreSQL when needed. Therefore, this approach is ideal static or infrequently updated data, as it's more resource-efficient than maintaining a constant CDC connection. + + +**PUBLIC PREVIEW** + +This feature is in the public preview stage, meaning it's nearing the final product but is not yet fully stable. If you encounter any issues or have feedback, please contact us through our [Slack channel](https://www.risingwave.com/slack). Your input is valuable in helping us improve the feature. For more information, see our [Public preview feature list](/changelog/product-lifecycle#features-in-the-public-preview-stage). + + +## Syntax + +Define `postgres_query` as follows: + +```sql +postgres_query( + hostname varchar, -- Database hostname + port varchar, -- Database port + username varchar, -- Authentication username + password varchar, -- Authentication password + database_name varchar, -- Target database name + query varchar -- SQL query to execute +) +``` + +## Example + +1. In your PostgreSQL database, create a table and populate it with sample data. + +```sql +CREATE TABLE test (id bigint primary key, x int); +INSERT INTO test SELECT id, id::int FROM generate_series(1, 100) AS t(id); +``` + +2. In RisingWave, use `postgres_query` function to retrieve rows where `id > 90`. + +```sql +SELECT * +FROM postgres_query('localhost', '5432', 'postgres', 'postgres', 'mydb', 'SELECT * FROM test WHERE id > 90;'); +----RESULT +91 91 +92 92 +93 93 +94 94 +95 95 +96 96 +97 97 +98 98 +99 99 +100 100 +``` diff --git a/mint.json b/mint.json index 7612fe8b..33a700d9 100644 --- a/mint.json +++ b/mint.json @@ -668,6 +668,7 @@ "pages": [ "integrations/sources/postgresql-cdc", + "integrations/sources/postgresql-table", "integrations/sources/mysql-cdc", "integrations/sources/sql-server-cdc", "integrations/sources/mongodb-cdc",