Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(expr): Implement evaluation for CONCAT_WS expression #2470

Merged
merged 4 commits into from
May 16, 2022

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented May 12, 2022

What's changed and what's your intention?

  • Summarize your change (mandatory)
    As per title. Changes for frontend will be implemented separately.

  • How does this PR work? Need a brief introduction for the changed logic (optional)

  • Describe clearly one logical change and avoid lazy messages (optional)

  • Describe any limitations of the current code (optional)

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

Part of #2405. Migrated over from #2427 for CI to run correctly. cc @skyzh .

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 800 files.

Valid Invalid Ignored Fixed
798 1 1 0
Click to see the invalid file list
  • src/expr/src/expr/expr_concat_ws.rs

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 800 files.

Valid Invalid Ignored Fixed
798 1 1 0
Click to see the invalid file list
  • src/expr/src/expr/expr_concat_ws.rs

src/expr/src/expr/expr_concat_ws.rs Show resolved Hide resolved
@kwannoel kwannoel self-assigned this May 12, 2022
@CLAassistant
Copy link

CLAassistant commented May 12, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@kwannoel kwannoel mentioned this pull request May 13, 2022
3 tasks
@kwannoel kwannoel changed the title feat: Implement CONCAT_WS feat: Implement evaluation for CONCAT_WS May 13, 2022
@kwannoel kwannoel changed the title feat: Implement evaluation for CONCAT_WS feat(expr): Implement evaluation for CONCAT_WS May 13, 2022
@kwannoel kwannoel force-pushed the noel/concat_ws branch 2 times, most recently from 2ab523d to bc43bda Compare May 13, 2022 08:28
@kwannoel
Copy link
Contributor Author

kwannoel commented May 13, 2022

Only wrote unit tests. Are there any integration tests I should be writing?
For e2e_tests I shall write them when I'm also done with frontend changes (separate PR).

@codecov
Copy link

codecov bot commented May 13, 2022

Codecov Report

Merging #2470 (45a6822) into main (ec86641) will increase coverage by 0.03%.
The diff coverage is 94.33%.

@@            Coverage Diff             @@
##             main    #2470      +/-   ##
==========================================
+ Coverage   71.39%   71.43%   +0.03%     
==========================================
  Files         688      689       +1     
  Lines       88505    88611     +106     
==========================================
+ Hits        63191    63298     +107     
+ Misses      25314    25313       -1     
Flag Coverage Δ
rust 71.43% <94.33%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/expr/src/expr/mod.rs 47.61% <0.00%> (-1.17%) ⬇️
src/expr/src/expr/expr_concat_ws.rs 95.23% <95.23%> (ø)
src/meta/src/hummock/hummock_manager.rs 87.70% <0.00%> (+0.10%) ⬆️
src/common/src/types/ordered_float.rs 24.90% <0.00%> (+0.19%) ⬆️
src/meta/src/barrier/mod.rs 68.21% <0.00%> (+0.33%) ⬆️
src/meta/src/hummock/compaction/mod.rs 84.11% <0.00%> (+0.46%) ⬆️
src/common/src/array/data_chunk.rs 95.21% <0.00%> (+0.49%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@kwannoel kwannoel marked this pull request as ready for review May 13, 2022 08:50
@kwannoel kwannoel changed the title feat(expr): Implement evaluation for CONCAT_WS feat(expr): Implement evaluation for CONCAT_WS expression May 13, 2022
@kwannoel kwannoel requested review from xiangjinwu and lmatz May 13, 2022 08:50
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

proto/expr.proto Show resolved Hide resolved
Comment on lines 51 to 52
let mut builder = Utf8ArrayBuilder::new(input.cardinality())?;
let row_len = string_columns[0].len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is row_len the same as input.cardinality()?

Comment on lines 56 to 60
if sep.is_none() {
builder.append(None)?;
continue;
}
let sep = sep.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let sep = match sep {
  Some(sep) => sep,
  None => {
    builder.append(None)?;
    continue;
  }
}

let mut found_first = false;

for string_column in &string_columns {
let string_column = string_column.as_utf8();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call as_utf8 once for each column rather than in the loop.

Copy link
Contributor Author

@kwannoel kwannoel May 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some help regarding this. I tried to call as_utf8 above when I initialized string_columns: https://github.com/singularity-data/risingwave/pull/2470/files#diff-5c777fd52ffc145fbcdb963a452809c403db833992216a8361b724b555743217R45-R49.

This is the updated code. It does not compile because r.as_utf8() returns data owned by the closure:

        let string_columns = self
            .string_exprs
            .iter()
            .map(|c| {
                let r: ArrayRef /* */ = c.eval(input)?;
                Ok(r.as_utf8())
            })
            .collect::<Result<Vec<_>>>()?;

I have also tried into_utf8(), but that only works if we have ownership of ArrayImpl, which we don't in this case.

Any other approach you can suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let string_columns_ref = string_columns.iter().map(|c| {
            c.as_utf8()
        }).collect::<Vec<_>>();

right after string_columns is constructed.

Then use for string_column in &string_columns_ref instead in the loop.

Copy link
Contributor Author

@kwannoel kwannoel May 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the updated code, as per your suggestion:

        let string_columns = self
            .string_exprs
            .iter()
            .map(|c| c.eval(input))
            .collect::<Result<Vec<Arc<ArrayImpl>>>>()?;
        let string_columns_ref = string_columns
            .iter()
            .map(|c| c.as_utf8() )
            .collect::<Vec<&Utf8Array>>();

To check my understanding: this works because string_columns lifetime lasts until the end of the eval function scope. Is this correct?

}
let sep = sep.unwrap();

let mut res = String::from("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is PartialBytesWriter in utf8_array.rs that supporting writing directly to the Utf8ArrayBuilder. You can take a look at the doc comments on these types:
BytesWriter, PartialBytesWriter, BytesGuard

Comment on lines 151 to 164
let array = Utf8Array::from_slice(&[Some(","), None, Some(","), Some(",")])
.map(|x| Arc::new(x.into()))
.unwrap();
let col1 = Column::new(array);
let array = Utf8Array::from_slice(&[Some("a"), Some("a"), None, None])
.map(|x| Arc::new(x.into()))
.unwrap();
let col2 = Column::new(array);
let array = Utf8Array::from_slice(&[Some("b"), Some("b"), Some("a"), None])
.map(|x| Arc::new(x.into()))
.unwrap();
let col3 = Column::new(array);

let data_chunk = DataChunk::builder().columns(vec![col1, col2, col3]).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try out DataChunk::from_pretty

let array = Utf8Array::from_slice(&[Some(","), None, Some(","), Some(",")])
.map(|x| Arc::new(x.into()))
.unwrap();
let col1 = Column::new(array);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may use column! macro

for string_column in &string_columns {
let string_column = string_column.as_utf8();
if let Some(string) = string_column.value_at(row_idx) {
if !found_first {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also move this found_first = true and res.push_str(string); out of the for string_column loop to eliminate the if branch.

@kwannoel kwannoel requested review from xiangjinwu and lmatz May 15, 2022 17:29
Copy link
Contributor

@lmatz lmatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kwannoel kwannoel merged commit a33cd51 into main May 16, 2022
@kwannoel kwannoel deleted the noel/concat_ws branch May 16, 2022 08:27
@kwannoel kwannoel restored the noel/concat_ws branch May 17, 2022 07:00
@kwannoel kwannoel deleted the noel/concat_ws branch May 17, 2022 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants