Skip to content

Commit

Permalink
revision
Browse files Browse the repository at this point in the history
  • Loading branch information
WanYixian committed Nov 21, 2024
1 parent a9aa5d6 commit 1fd596f
Showing 1 changed file with 22 additions and 15 deletions.
37 changes: 22 additions & 15 deletions sql/commands/sql-create-source.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,35 +115,42 @@ To completely disable it at the cluster level, go to [`risingwave.toml`](https:/

### Compared with non-shared source

Previously, when creating a source using the CREATE SOURCE statement:
With non-shared sources, when using the `CREATE SOURCE` statement:
- No streaming jobs would be instantiated. A source is just a set of metadata stored in the catalog.
- Only when a materialized view references the source, a `SourceExecutor` will be created to start the process of data ingestion.

1. No streaming jobs were created initially; the source existed only as metadata in the catalog.
2. When creating multiple materialized views on the same source, each materialized view spawned its own `SourceExecutor`, leading to:
- Increased resource usage: Each `SourceExecutor` consumed Kafka resources independently, adding pressure to both the Kafka broker and RisingWave.
- Potential inconsistencies: Independent `SourceExecutor` instances could result in different consumption progress, causing temporary inconsistencies when joining materialized views.
This leads to increased resource usage and potential inconsistencies:
- Each `SourceExecutor` consumed Kafka resources independently, adding pressure to both the Kafka broker and RisingWave.
- Independent `SourceExecutor` instances could result in different consumption progress, causing temporary inconsistencies when joining materialized views.

<Frame>
<img src="/images/non-shared-source.png"/>
</Frame>

With shared sources, when creating a source using the CREATE SOURCE statement:
With shared sources, when using the `CREATE SOURCE` statement:
- It will instantiate a single `SourceExecutor` immediately.
- All materialized views referencing the same source share the `SourceExecutor`.
- The downstream materialized views will only forwards data from the upstream sources, instead of consuming from Kafka independently.

1. It will create a single SourceExecutor immediately.
2. All materialized views referencing the same source share the SourceExecutor, improving resource utilization and ensuring consistent data consumption.
3. When creating a materialized view, RisingWave backfills historical data from Kafka. The process blocks the DDL statement until backfill completes. This behavior can be configured using [SET BACKGROUND_DDL](/sql/commands/sql-set-background-ddl).


If external systems enforce retention policies or are not re-consumable, new materialized views may not backfill complete historical data. This can lead to inconsistencies with earlier materialized views.

Use the [SHOW JOBS](/sql/commands/sql-show-jobs) command or monitor `Kafka Consumer Lag Size` in the Grafana dashboard (under `Streaming`) to track backfill progress.
This improves resource utilization and consistency.

<Frame>
<img src="/images/shared-source.png"/>
</Frame>

When creating a materialized view, RisingWave backfills historical data from Kafka. The process blocks the DDL statement until backfill completes.

- To configure this behavior, use the [SET BACKGROUND_DDL](/sql/commands/sql-set-background-ddl) command. This is similar to the backfilling procedure when creating a materialized view on tables and materialized views.

- To monitoring backfill progress, use the [SHOW JOBS](/sql/commands/sql-show-jobs) command or check `Kafka Consumer Lag Size` in the Grafana dashboard (under `Streaming`).


<Note>If you set up a retention policy or if the external system can only be accessed once (like message queues), and the data is no longer available, any newly created materialized views won’t be able to backfill the complete historical data. This can lead to inconsistencies with earlier materialized views.</Note>


### Compared with table

A CREATE TABLE statement can provide similar benefits to shared sources but includes data persistence.
A `CREATE TABLE` statement can provide similar benefits to shared sources, except that it needs to persist all consumed data.

For table with connector, downstream materialized views backfill historical data from the table instead of external sources, which may be more efficient and cause less pressure to the external system. This also gives table stronger consistency guarantee, as historical data will be ensured to be present.

Expand Down

0 comments on commit 1fd596f

Please sign in to comment.