Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(expr): support any type in function signatures #12262

Merged
merged 27 commits into from
Oct 10, 2023
Merged

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Sep 13, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Previously we use DataTypeName for types in function signatures. However:

  1. It doesn't support any type. For functions containing any type, such as array_agg(any) -> anyarray, currently we enumerate all types and generate a function for each specific type. (this is done by expanding the wildcard * in the macro)
  2. It doesn't support particular array type, such as varchar[]. Currently there is only a DataTypeName::List for array type, which is actually equivalent to anyarray. It doesn't have sufficient information, so we have to manually implement type inference for such functions.

To solve these problems, this PR introduces a new type SigDataType to replace DataTypeName in signatures:

pub enum SigDataType {
    Exact(DataType),    // `int`, `varchar[]`...
    Any,                // `any` type
    AnyArray,           // `anyarray` = old List
    AnyStruct,          // = old Struct
}

In #[function] macro:

  1. the previous type list is renamed to anyarray.
  2. a new type any is added. Its corresponding Rust type is ScalarImpl or ScalarRefImpl<'_>.

For example:

#[aggregate("array_agg(any) -> anyarray")]
fn array_agg(state: Option<ListValue>, value: Option<ScalarRefImpl<'_>>) -> ListValue {...}
  1. when the return type is either any or anyarray, the macro can automatically generate a type inference function based on the signature. Here are some examples:

    • mode(any) -> any: return_type = args[0]
    • array_agg(any) -> anyarray: return_type = List(args[0])
    • array_access(anyarray, int32) -> any: return_type = args[0].as_list()
    • array_distinct(anyarray) -> anyarray: return_type = args[0]

    The rules are following postgres documents:

    Each position (either argument or return value) declared as anyelement is allowed to have any specific actual data type, but in any given call they must all be the same actual type. Each position declared as anyarray can have any array data type, but similarly they must all be the same type.
    Furthermore, if there are positions declared anyarray and others declared anyelement, the actual array type in the anyarray positions must be an array whose elements are the same type appearing in the anyelement positions.
    https://www.postgresql.org/docs/16/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@kwannoel
Copy link
Contributor

Can there be more details added to the PR description, on the background of this refactor?

I thought previously we supported any via *, e.g.

#[aggregate("array_agg(*) -> list")]

Or is this different?

@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Sep 13, 2023

Can there be more details added to the PR description, on the background of this refactor?

I thought previously we supported any via *, e.g.

#[aggregate("array_agg(*) -> list")]

Or is this different?

I'll complete the description tomorrow.

Simply put, we don't support the any type in function signatures. (see Polymorphic Types) Currently we enumerate all types and generate a function for each specific type. (this is done by expanding the wildcard * in the macro)

After this PR, we will have one array_agg function instead of multiple:

- array_agg(int16) -> list
- array_agg(int32) -> list
- array_agg(int64) -> list
 ...
- array_agg(list) -> list
- array_agg(struct) -> list
+ array_agg(any) -> list

And we will get the input as a DatumRef in the function body instead of a specific scalar type:

- fn array_agg<'a>(state: Option<ListValue>, value: Option<impl ScalarRef<'a>>) -> ListValue {
+ fn array_agg<'a>(state: Option<ListValue>, value: Option<ScalarRefImpl<'a>>) -> ListValue {

Base automatically changed from wrj/varargs to main September 15, 2023 09:21
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@stdrc stdrc self-requested a review September 19, 2023 06:55
Signed-off-by: Runji Wang <[email protected]>
@wangrunji0408 wangrunji0408 marked this pull request as ready for review September 19, 2023 08:11
@xxchan
Copy link
Member

xxchan commented Sep 20, 2023

After this PR, we will have one array_agg function instead of multiple:

May I ask what's the benefit? 👀

@wangrunji0408
Copy link
Contributor Author

After this PR, we will have one array_agg function instead of multiple:

May I ask what's the benefit? 👀

Avoiding generating too much code and slowing down compilation? if it counts. 🤡

src/frontend/src/expr/type_inference/func.rs Show resolved Hide resolved
src/frontend/src/expr/agg_call.rs Outdated Show resolved Hide resolved
src/expr/src/sig/mod.rs Outdated Show resolved Hide resolved
src/expr/macro/src/types.rs Outdated Show resolved Hide resolved
@@ -35,39 +35,33 @@ fn string_to_array_inner<'a>(
}

// Use cases shown in `e2e_test/batch/functions/string_to_array.slt.part`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, seems docslt is a good choice😁

src/expr/src/agg/general.rs Outdated Show resolved Hide resolved
src/expr/src/sig/mod.rs Outdated Show resolved Hide resolved
src/expr/src/vector_op/array.rs Outdated Show resolved Hide resolved
Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

src/expr/core/src/sig/mod.rs Outdated Show resolved Hide resolved
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Oct 10, 2023
@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Merging #12262 (bf0381a) into main (615f873) will decrease coverage by 0.16%.
Report is 11 commits behind head on main.
The diff coverage is 68.03%.

@@            Coverage Diff             @@
##             main   #12262      +/-   ##
==========================================
- Coverage   69.41%   69.25%   -0.16%     
==========================================
  Files        1473     1471       -2     
  Lines      242568   242507      -61     
==========================================
- Hits       168376   167959     -417     
- Misses      74192    74548     +356     
Flag Coverage Δ
rust 69.25% <68.03%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/common/src/array/list_array.rs 84.14% <100.00%> (+0.06%) ⬆️
src/expr/impl/src/aggregate/array_agg.rs 100.00% <100.00%> (ø)
src/expr/impl/src/aggregate/jsonb_agg.rs 0.00% <ø> (-34.49%) ⬇️
src/expr/impl/src/aggregate/mode.rs 2.04% <100.00%> (ø)
src/expr/impl/src/aggregate/percentile_disc.rs 1.69% <100.00%> (ø)
src/expr/impl/src/aggregate/string_agg.rs 100.00% <ø> (ø)
src/expr/impl/src/scalar/array_access.rs 100.00% <100.00%> (ø)
src/expr/impl/src/scalar/array_concat.rs 94.00% <100.00%> (-0.65%) ⬇️
src/expr/impl/src/table_function/pg_expandarray.rs 21.42% <100.00%> (-7.15%) ⬇️
src/expr/impl/src/table_function/unnest.rs 42.85% <100.00%> (-14.29%) ⬇️
... and 36 more

... and 47 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Merged via the queue into main with commit c952f32 Oct 10, 2023
5 of 7 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/sig branch October 10, 2023 17:03
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deprecate * after supporting any? 🤔

bytea Bytea BytesArray Box<[u8]> &[u8] _
jsonb Jsonb JsonbArray JsonbVal JsonbRef<'_> _
anyarray List ListArray ListValue ListRef<'_> _
struct Struct StructArray StructValue StructRef<'_> _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why struct is not anystruct?

Copy link
Contributor Author

@wangrunji0408 wangrunji0408 Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of struct doesn't matter. It is only used internally. In public API, users need to write a full struct schema, like struct<a: int, ...>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do write struct in some places 🤪

#[aggregate("min(struct) -> auto", state = "ref")]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad memory. 🤪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way, anystruct seems to be a reasonable name.

@wangrunji0408
Copy link
Contributor Author

Can we deprecate * after supporting any? 🤔

I don't think so. The relationship between * and any is similar to impl T and dyn T in Rust. The performance of using * would be much better than any.

@xxchan
Copy link
Member

xxchan commented Jul 22, 2024

Can we deprecate * after supporting any? 🤔

I don't think so. The relationship between * and any is similar to impl T and dyn T in Rust. The performance of using * would be much better than any.

🤡 Hmmm, if so, shouldn't we prefer * over any instead? It sounds that their expressiveness is the same, but the performance of any is always worse.

@wangrunji0408
Copy link
Contributor Author

Can we deprecate * after supporting any? 🤔

I don't think so. The relationship between * and any is similar to impl T and dyn T in Rust. The performance of using * would be much better than any.

🤡 Hmmm, if so, shouldn't we prefer * over any instead? It sounds that their expressiveness is the same, but the performance of any is always worse.

Not always, and in some cases, using any is more convenient. For example, in array functions, the element type of ListValue is Scalar(Ref)Impl. It is unnecessary to convert it from/into various T. 🤡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants