Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): Virtual column support alias name #17365

Merged
merged 10 commits into from
Jan 27, 2025

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented Jan 24, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

virtual columns allow set alias names, user can use alias name to query variant inner column

for example:

root@0.0.0.0:48000/default> create table t1 (a int null, v json null);
0 row written in 0.084 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@0.0.0.0:48000/default> insert into t1 values(1, parse_json('{"a":[1,2,3],"b":{"c":10}}'));
┌─────────────────────────┐
│ number of rows inserted │
│          UInt64         │
├─────────────────────────┤
│                       1 │
└─────────────────────────┘
1 row written in 0.076 sec. Processed 1 row, 81 B (13.16 rows/s, 1.04 KiB/s)

root@0.0.0.0:48000/default> create virtual column (v['a'][0] as v1, v['b'] as v2) for t1;
0 row written in 0.036 sec. Processed 0 row, 0 B (0 row/s, 0 B/s)

root@0.0.0.0:48000/default> refresh virtual column for t1;
0 row read in 0.032 sec. Processed 1 row, 76 B (31.25 rows/s, 2.32 KiB/s)

root@0.0.0.0:48000/default> select a, v1, v2 from t1;
┌─────────────────────────────────────────────────────────┐
│        a        │         v1        │         v2        │
│ Nullable(Int32) │ Nullable(Variant) │ Nullable(Variant) │
├─────────────────┼───────────────────┼───────────────────┤
│               11                 │ {"c":10}          │
└─────────────────────────────────────────────────────────┘
1 row read in 0.042 sec. Processed 1 row, 64 B (23.81 rows/s, 1.49 KiB/s)

root@0.0.0.0:48000/default> explain select a, v1, v2 from t1;
-[ EXPLAIN ]-----------------------------------
TableScan
├── table: default.default.t1
├── output columns: [a (#0), v['a'][0] (#2), v['b'] (#3)]
├── read rows: 1
├── read size: < 1 KiB
├── partitions total: 1
├── partitions scanned: 1
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1>]
├── push downs: [filters: [], limit: NONE]
├── virtual columns: [v['a'][0], v['b']]
└── estimated rows: 1.00

11 rows explain in 0.033 sec. Processed 0 rows, 0 B (0 row/s, 0 B/s)
  • fixes: #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jan 24, 2025
@b41sh b41sh requested a review from sundy-li January 26, 2025 02:32
@b41sh b41sh marked this pull request as ready for review January 26, 2025 02:32
@b41sh b41sh requested a review from drmingdrmer as a code owner January 26, 2025 02:32
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 43 of 43 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @b41sh and @sundy-li)


src/meta/protos/proto/virtual_column.proto line 50 at r1 (raw file):

  // if true, virtual columns are auto generated,
  // otherwise virtual columns are created by user.
  optional bool auto_generated = 7;

This is explicitly set to be an Option. What does it mean if it is None? I see that None is replaced with the default value false thus optional seems to be unnecessary?


src/meta/app/src/schema/virtual_column.rs line 52 at r1 (raw file):

}

#[derive(serde::Serialize, serde::Deserialize, Clone, Debug, Eq, PartialEq)]

Why do VirtualColumnMeta and VirtualField need serde?


src/meta/app/src/schema/virtual_column.rs line 59 at r1 (raw file):

    pub created_on: DateTime<Utc>,
    pub updated_on: Option<DateTime<Utc>>,
    pub auto_generated: bool,

This field needs to be explained about its purpose and usage.


src/meta/proto-conv/src/virtual_column_from_to_protobuf_impl.rs line 62 at r1 (raw file):

            };
            let alias_name = if let Some(alias_name) = p.alias_names.get(i) {
                if !alias_name.is_empty() {

I do believe any string including empty string "" could be a valid name. 🤔

@drmingdrmer drmingdrmer force-pushed the feat-virtual-column-alias branch from 45e1d20 to 4705cc8 Compare January 27, 2025 01:03
@b41sh b41sh force-pushed the feat-virtual-column-alias branch from 4705cc8 to 80924ae Compare January 27, 2025 08:10
@b41sh b41sh requested a review from drmingdrmer January 27, 2025 08:10
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaner and simpler! Good job!

Reviewed 7 of 7 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @b41sh)


src/meta/protos/proto/virtual_column.proto line 47 at r2 (raw file):

  // virtual column alias names,
  // key is the index of `virtual_columns` field.
  map<uint64, string> alias_names = 6;

Nice shot! Using a map significantly simplified the data conversion logic.


src/meta/app/src/schema/virtual_column.rs line 75 at r2 (raw file):

    pub updated_on: Option<DateTime<Utc>>,
    // Whether the virtual columns are auto-generated,
    // true for auto-generated, false for user-defined.

I think it's meant to be:

Suggestion:

    // Whether the virtual column alias names are auto-generated,
    // true for auto-generated, false for user-defined.

@b41sh
Copy link
Member Author

b41sh commented Jan 27, 2025

Cleaner and simpler! Good job!

Reviewed 7 of 7 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @b41sh)

src/meta/protos/proto/virtual_column.proto line 47 at r2 (raw file):

  // virtual column alias names,
  // key is the index of `virtual_columns` field.
  map<uint64, string> alias_names = 6;

Nice shot! Using a map significantly simplified the data conversion logic.

src/meta/app/src/schema/virtual_column.rs line 75 at r2 (raw file):

    pub updated_on: Option<DateTime<Utc>>,
    // Whether the virtual columns are auto-generated,
    // true for auto-generated, false for user-defined.

I think it's meant to be:

Suggestion:

    // Whether the virtual column alias names are auto-generated,
    // true for auto-generated, false for user-defined.

Thanks for the review. @drmingdrmer auto-generated means whether the virtual columns are auto-generated or not, not the alias names. this field is not used yet, it's a feature I'm planning to do, just this PR by the way to change the meta first.

@drmingdrmer
Copy link
Member

Thanks for the review. @drmingdrmer auto-generated means whether the virtual columns are auto-generated or not, not the alias names. this field is not used yet, it's a feature I'm planning to do, just this PR by the way to change the meta first.

Then everything is good to me :)

@b41sh b41sh enabled auto-merge January 27, 2025 08:31
@b41sh b41sh added this pull request to the merge queue Jan 27, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 27, 2025
@BohuTANG BohuTANG merged commit c5e7355 into databendlabs:main Jan 27, 2025
71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants