Suggest testing it with madsim, so you can reproduce the same results each time. See Running with Madsim.
Sometimes you may encounter unrelated errors, since SqlSmith frontend tests run per PR.
If these errors relate to Expression errors, they may be hard to avoid.
In those cases, you may choose to ignore them, by adding these errors to validation.rs
.
For instance adding ORDER BY
, or Emit On Window Close
.
There isn't a clear-cut way for these.
Typically, to add them you would do the following:
- Identify which part of the AST these are from.
- Look where this part of AST it is generated. The section on Query Generation should help. The generation process roughly mirrors the structure of the AST.
- Add it to that part of the AST.
If it affects generation of other parts of the query, you may wish to re-order the generation process.
For instance, the GROUP BY
clause affects which columns we can select,
so we generate it before generating the SELECT
items.
TODO: Here provide an example / tutorial of adding
Emit On Window Close
.
To check if your expression is included just generate ft.txt (see README.md), and make sure your newly added function is there.
TODO: Generate a not-ft.txt which includes functions not tested by sqlsmith.
The following sections are on adding various expression types.
Mostly for new functions it should have been generated by default, since we use frontend's type signature map to generate functions.
For functions of variable arity or functions with three or more arguments, they may not be included.
In that case, you may reference the implementation of concat
for functions of variable arity.
For functions with fixed arity but more than three args, they do not use vectorized execution,
and hence they do not have custom implementations like Expr::Binop
or Expr::Unop
. Instead, they get translated to a Function
.
Their implementation can be referenced via make_simple_func
.
New Aggregates should already be included by default.
New Casts should already be included by default, or blacklisted explicitly.
Query execution and generation happen in step. Here's an overview of it.
- Create Base tables, based on the supplied
test_data
argument to sqlsmith. - Generate
INSERT
s and Populate base tables. - Generate and Create base materialized views, they will be used to test mv-on-mv. These materialized views can either be on Base tables, or on a previously created materialize view from this step.
- Generate
UPDATE / DELETE
statements and Update base tables with them. If no PK we will just doDELETE
for some rows andINSERT
back statements. - Generate and run batch queries e.g.
SELECT * FROM t
,WITH w AS ... SELECT * FROM w
. - Generate and run stream queries. These are immediately removed after they are successfully created.
- Drop base materialized views.
- Drop base tables.
Queries are generated at random, with weighted constraints.
Additionally, query complexity is determined by can_recurse
. This controls how deeply queries nest.
We can have better ways to configure this in the future.
Query generation happens recursively, in top-down fashion, roughly following the AST.
For both batch and stream queries, we go through roughly the same process.
We use is_mview
to selectively disable some sql features which are not well supported by stream engine.
Here's a broad example of what happens when generating a complex query:
- Generate a with clause.
- Generate the set expression (
SELECT ...
) - Generate
ORDER BY
.
For the set expression we just generate a select statement, which then:
- Generates the
FROM
expression (includes joins). - The
WHERE
expression. - Other parts of the
SELECT
query. - The list of
SELECT
items.
For the list of select items, it first chooses a type and eventually calls gen_expr
with that type.
This generates either:
- A scalar value.
- A column.
- Functions.
- Aggregates.
- Casts.
- Other kinds of expressions e.g.
CASE ... WHEN
.
We mentioned that we call gen_expr
with a specific type.
That should be the return type of calling functions and aggregates.
It should also be the cast target type.
Hence, we have type signature maps which we use to fetch candidate functions / aggregates / casts with that specific type. We just pick one from the candidate set, and generate some arguments for it. Then we have the function / aggregate call, or the cast expression generated successfully.
You may start from query.rs
and trace the code for further details.
This section is WIP.
For setup phase, only CREATE MATERIALIZED VIEW
requires table name generation. We prefix them with m{id}
, where id
is unique for each materialized view.
For both setup and query phase, table factors also need generated names.
To ensure there are no duplicated name conflicts, we always generate a new unique id
for each table name.
For columns, when referencing them, we always use qualified names to do so.
Additionally for select
, we always generate aliases for each selected item.
These can be locally unique, when they are referenced, we always use qualified form, so we won't have ambiguity.
We generate a query snapshot weekly. This query snapshot just contains several sets of queries, generated by SqlSmith, and used to test Risingwave E2e.
Example of snapshot:
├── 1
│ └── queries.sql
├── 10
│ └── queries.sql
├── 100
│ └── queries.sql
├── 11
│ └── queries.sql
├── 12
│ └── queries.sql
└── failed
├── 27
│ ├── fail.log
│ ├── generate-27.log
│ └── queries.sql
When the query snapshot is generated, it should only contain passing queries.
Failing queries should be extracted out into the failed
dir.
Hence, the snapshot should pass for every run, otherwise it indicates a regression OR that some invalid queries were previously accepted by Risingwave.
This is what runs per PR, and daily in main-cron.
Random fuzzing happens per PR for frontend sqlsmith tests. We don't run e2e fuzzing tests per PR because it found too many lower-priority bugs and started slowing down development. However, if there are any changes to sqlsmith source files, e2e fuzzing tests will still run.