Skip to content

Latest commit

 

History

History
171 lines (121 loc) · 6.6 KB

develop.md

File metadata and controls

171 lines (121 loc) · 6.6 KB

Develop

Testing the new feature you added

Suggest testing it with madsim, so you can reproduce the same results each time. See Running with Madsim.

Ignoring errors

Sometimes you may encounter unrelated errors, since SqlSmith frontend tests run per PR. If these errors relate to Expression errors, they may be hard to avoid. In those cases, you may choose to ignore them, by adding these errors to validation.rs.

Adding new query clauses

For instance adding ORDER BY, or Emit On Window Close.

There isn't a clear-cut way for these.

Typically, to add them you would do the following:

  1. Identify which part of the AST these are from.
  2. Look where this part of AST it is generated. The section on Query Generation should help. The generation process roughly mirrors the structure of the AST.
  3. Add it to that part of the AST.

If it affects generation of other parts of the query, you may wish to re-order the generation process. For instance, the GROUP BY clause affects which columns we can select, so we generate it before generating the SELECT items.

TODO: Here provide an example / tutorial of adding Emit On Window Close.

Adding new expressions

To check if your expression is included just generate ft.txt (see README.md), and make sure your newly added function is there.

TODO: Generate a not-ft.txt which includes functions not tested by sqlsmith.

The following sections are on adding various expression types.

Functions

Mostly for new functions it should have been generated by default, since we use frontend's type signature map to generate functions.

For functions of variable arity or functions with three or more arguments, they may not be included.

In that case, you may reference the implementation of concat for functions of variable arity. For functions with fixed arity but more than three args, they do not use vectorized execution, and hence they do not have custom implementations like Expr::Binop or Expr::Unop. Instead, they get translated to a Function. Their implementation can be referenced via make_simple_func.

Aggregates

New Aggregates should already be included by default.

Casts

New Casts should already be included by default, or blacklisted explicitly.

Test Runner overview

Query execution and generation happen in step. Here's an overview of it.

  1. Create Base tables, based on the supplied test_data argument to sqlsmith.
  2. Generate INSERTs and Populate base tables.
  3. Generate and Create base materialized views, they will be used to test mv-on-mv. These materialized views can either be on Base tables, or on a previously created materialize view from this step.
  4. Generate UPDATE / DELETE statements and Update base tables with them. If no PK we will just do DELETE for some rows and INSERT back statements.
  5. Generate and run batch queries e.g. SELECT * FROM t, WITH w AS ... SELECT * FROM w.
  6. Generate and run stream queries. These are immediately removed after they are successfully created.
  7. Drop base materialized views.
  8. Drop base tables.

Query Generation

Queries are generated at random, with weighted constraints.

Additionally, query complexity is determined by can_recurse. This controls how deeply queries nest. We can have better ways to configure this in the future.

Query generation happens recursively, in top-down fashion, roughly following the AST. For both batch and stream queries, we go through roughly the same process. We use is_mview to selectively disable some sql features which are not well supported by stream engine.

Here's a broad example of what happens when generating a complex query:

  1. Generate a with clause.
  2. Generate the set expression (SELECT ...)
  3. Generate ORDER BY.

For the set expression we just generate a select statement, which then:

  1. Generates the FROM expression (includes joins).
  2. The WHERE expression.
  3. Other parts of the SELECT query.
  4. The list of SELECT items.

For the list of select items, it first chooses a type and eventually calls gen_expr with that type.

This generates either:

  1. A scalar value.
  2. A column.
  3. Functions.
  4. Aggregates.
  5. Casts.
  6. Other kinds of expressions e.g. CASE ... WHEN.

We mentioned that we call gen_expr with a specific type. That should be the return type of calling functions and aggregates. It should also be the cast target type.

Hence, we have type signature maps which we use to fetch candidate functions / aggregates / casts with that specific type. We just pick one from the candidate set, and generate some arguments for it. Then we have the function / aggregate call, or the cast expression generated successfully.

You may start from query.rs and trace the code for further details.

Subquery generation

This section is WIP.

Table and Column name generation

For setup phase, only CREATE MATERIALIZED VIEW requires table name generation. We prefix them with m{id}, where id is unique for each materialized view.

For both setup and query phase, table factors also need generated names. To ensure there are no duplicated name conflicts, we always generate a new unique id for each table name.

For columns, when referencing them, we always use qualified names to do so. Additionally for select, we always generate aliases for each selected item. These can be locally unique, when they are referenced, we always use qualified form, so we won't have ambiguity.

Workflow

Sqlsmith query snapshots

We generate a query snapshot weekly. This query snapshot just contains several sets of queries, generated by SqlSmith, and used to test Risingwave E2e.

Example of snapshot:

├── 1
│   └── queries.sql
├── 10
│   └── queries.sql
├── 100
│   └── queries.sql
├── 11
│   └── queries.sql
├── 12
│   └── queries.sql
└── failed
    ├── 27
    │    ├── fail.log
    │    ├── generate-27.log
    │    └── queries.sql

When the query snapshot is generated, it should only contain passing queries. Failing queries should be extracted out into the failed dir.

Hence, the snapshot should pass for every run, otherwise it indicates a regression OR that some invalid queries were previously accepted by Risingwave.

This is what runs per PR, and daily in main-cron.

Random Fuzzing

Random fuzzing happens per PR for frontend sqlsmith tests. We don't run e2e fuzzing tests per PR because it found too many lower-priority bugs and started slowing down development. However, if there are any changes to sqlsmith source files, e2e fuzzing tests will still run.