Skip to content

Commit

Permalink
Add shared source (#69)
Browse files Browse the repository at this point in the history
* mark as public preview

* add doc

* revision

* Update sql/commands/sql-create-source.mdx

Co-authored-by: Eric Fu <[email protected]>
Signed-off-by: IrisWan <[email protected]>

* Update sql/commands/sql-create-source.mdx

Co-authored-by: Eric Fu <[email protected]>
Signed-off-by: IrisWan <[email protected]>

---------

Signed-off-by: IrisWan <[email protected]>
Co-authored-by: Eric Fu <[email protected]>
  • Loading branch information
WanYixian and fuyufjh authored Nov 22, 2024
1 parent 9648cd7 commit 51a878a
Show file tree
Hide file tree
Showing 5 changed files with 81 additions and 0 deletions.
1 change: 1 addition & 0 deletions changelog/product-lifecycle.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Below is a list of all features in the public preview phase:

| Feature name | Start version |
| :-- | :-- |
| [Shared source](/sql/commands/sql-create-source/#shared-source) | 2.1 |
| [ASOF join](/docs/current/query-syntax-join-clause/#asof-joins) | 2.1 |
| [Partitioned Postgres CDC table](/docs/current/ingest-from-postgres-cdc/) | 2.1 |
| [Map type](/docs/current/data-type-map/) | 2.0 |
Expand Down
Binary file added images/non-shared-source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/shared-source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/table-with-connectors.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
80 changes: 80 additions & 0 deletions sql/commands/sql-create-source.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,86 @@ If Kafka is part of your technical stack, you can also use the Kafka connector i

For complete step-to-step guides about ingesting MySQL and PostgreSQL data using both approaches, see [Ingest data from MySQL](/docs/current/ingest-from-mysql-cdc/) and [Ingest data from PostgreSQL](/docs/current/ingest-from-postgres-cdc/).

## Shared source

Shared source improves resource utilization and data consistency when working with Kafka sources in RisingWave. This will only affect Kafka sources created after the version updated and will not affect any existing Kafka sources.

<Note>
**PUBLIC PREVIEW**

This feature is in the public preview stage, meaning it's nearing the final product but is not yet fully stable. If you encounter any issues or have feedback, please contact us through our [Slack channel](https://www.risingwave.com/slack). Your input is valuable in helping us improve the feature. For more information, see our [Public preview feature list](/product-lifecycle/#features-in-the-public-preview-stage).
</Note>

### Configure

Shared source is enabled by default. You can also set the session variable `streaming_use_shared_source` to control whether to enable it.

```sql
# change the config in the current session
SET streaming_use_shared_source=[true|false];

# change the default value of the session variable in the cluster
# (the current session is not affected)
ALTER SYSTEM SET streaming_use_shared_source=[true|false];
```

To completely disable it at the cluster level, go to [`risingwave.toml`](https://github.com/risingwavelabs/risingwave/blob/main/src/config/example.toml#L146) configuration file, and set the `stream_enable_shared_source` to `false`.

### Compared with non-shared source

With non-shared sources, when using the `CREATE SOURCE` statement:
- No streaming jobs would be instantiated. A source is just a set of metadata stored in the catalog.
- Only when a materialized view or sink references the source, a `SourceExecutor` will be created to start the process of data ingestion.

This leads to increased resource usage and potential inconsistencies:
- Each `SourceExecutor` consumed Kafka resources independently, adding pressure to both the Kafka broker and RisingWave.
- Independent `SourceExecutor` instances could result in different consumption progress, causing temporary inconsistencies when joining materialized views.

<Frame>
<img src="/images/non-shared-source.png"/>
</Frame>

With shared sources, when using the `CREATE SOURCE` statement:
- It will instantiate a single `SourceExecutor` immediately.
- All materialized views referencing the same source share the `SourceExecutor`.
- The downstream materialized views will only forwards data from the upstream sources, instead of consuming from Kafka independently.

This improves resource utilization and consistency.

<Frame>
<img src="/images/shared-source.png"/>
</Frame>

When creating a materialized view, RisingWave backfills historical data from Kafka. The process blocks the DDL statement until backfill completes.

- To configure this behavior, use the [SET BACKGROUND_DDL](/sql/commands/sql-set-background-ddl) command. This is similar to the backfilling procedure when creating a materialized view on tables and materialized views.

- To monitoring backfill progress, use the [SHOW JOBS](/sql/commands/sql-show-jobs) command or check `Kafka Consumer Lag Size` in the Grafana dashboard (under `Streaming`).


<Note>If you set up a retention policy or if the external system can only be accessed once (like message queues), and the data is no longer available, any newly created materialized views won’t be able to backfill the complete historical data. This can lead to inconsistencies with earlier materialized views.</Note>


### Compared with table

A `CREATE TABLE` statement can provide similar benefits to shared sources, except that it needs to persist all consumed data.

For table with connector, downstream materialized views backfill historical data from the table instead of external sources, which may be more efficient and cause less pressure to the external system. This also gives table stronger consistency guarantee, as historical data will be ensured to be present.

Tables offer other features that enhance their utility in data ingestion workflows. See [Table with connectors](/ingestion/overview#table-with-connectors).

<Frame>
<img src="/images/table-with-connectors.png"/>
</Frame>

<Note>
**LIMITATION**

Currently, shared source is only applicable to Kafka sources. Other sources are unaffected. We plan to gradually upgrade other sources to the be shared as well in the future.

Shared sources do not support `ALTER SOURCE`. Use non-shared sources if you require this functionality.
</Note>

## See also

<CardGroup>
Expand Down

0 comments on commit 51a878a

Please sign in to comment.