Skip to content

Commit

Permalink
recommit
Browse files Browse the repository at this point in the history
Signed-off-by: TennyZhuang <[email protected]>
  • Loading branch information
TennyZhuang committed Jul 31, 2023
1 parent 74d551a commit 4d45bf4
Showing 1 changed file with 85 additions and 0 deletions.
85 changes: 85 additions & 0 deletions rfcs/0068-error-record-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
feature: error_record_table
authors:
- "TennyZhuang"
start_date: "2023/07/31"
---

# Error Record Table

## Summary

Our current streaming engine does not help users to discover, debug, and handle errors well. When user met an data record error, they can only find a log record like ``ExprError: Parse error: expected `,` or `]` at line 1 column 10 (ProjectExecutor: fragment_id=19007)``.

User can't view the eror record, and can't replay with the error record.

We want to introduce the Error Record Table (ERT) to resolve the problem.

## Motivation

There are several benefits to maintain the error records ourselves:

1. We can ensure that our storage engine can handle the volume of erroneous data, as it is of the same magnitude as the source.
2. Users can view the error records directly over psql.
3. Users can reproduce the error easily by the similar SQL.

## Design

### Creating

The ERTs are automatically created as internal tables when an operator is created. In most cases, an operator will have n ERTs, where n corresponds to the number of inputs it has.

### Naming

Same as other internal tables while suffixed by `error_{seq}`.

### Schema

The schema of ERT should have the same fields as their input, with several extra columns:

1. `id bigint`: The ID can be generated by the similar method like `row_id` (vnode + local monotical ID).
2. `error_reason varchar`: A human-readable error message.

### Modification

To keep things simple, we do not permit any DML operations over the ERT. Only the `TRUNCATE TABLE` operation is permitted.

### The relationship between ERT and the log system

We should keep the warning entry in our log, and we can give the error record ID in the log entry.

We can even give a SQL to query the error record in the log entry if it's helpful to user.

## Unresolved questions

Should we allow creating sink over ERT?

## Alternatives

One alternative solution is to output the complete error record directly to the log system. There are some concerns:

1. The data record may be too large to record, e.g. several tens of KB.
2. Errors may occur continuously, causing the log system to fill up quickly.

## Future possibilities

### Data correction

ERT could potentially be used to correct data, for example, users could clean up the data within ERT and then reimport it into the source.

```sql
SELECT v1, v2, error_reason FROM __rw_internal_1023_source_1134_error_1;
# 10000, 0, "division by zero"
CREATE TEMP TABLE fixing_1234 (v1 int, v2 int);
INSERT INTO fixing_1234 (
SELECT v1, v2 FROM __rw_internal_1023_source_1134_error_1);
UPDATE fixing_1234 SET (v2 = 1) WHERE v2 = 0;
INSERT INTO source_table (
SELECT * FROM fixing_1234
);
TRUNCATE TABLE __rw_internal_1023_source_1134_error_1;
```

### Sink

For advanced users, we can still allow them sink the error records to their own system.

0 comments on commit 4d45bf4

Please sign in to comment.