-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(connector): support zero-copy access
during parsing
#17153
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @BugenZhao and the rest of your teammates on Graphite |
@@ -423,7 +429,7 @@ impl JsonParseOptions { | |||
.map_err(|_| create_error())? | |||
.into(), | |||
// ---- Varchar ----- | |||
(DataType::Varchar , ValueType::String) => value.as_str().unwrap().into(), | |||
(DataType::Varchar, ValueType::String) => return Ok(DatumCow::Ref(Some(value.as_str().unwrap().into()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core change: reduce allocation for a ScalarImpl::Utf8(Box<str>)
.
access
during parsingaccess
during parsing
d19d8f9
to
0520dbe
Compare
@@ -533,7 +539,7 @@ impl JsonParseOptions { | |||
} | |||
&BorrowedValue::Static(simd_json::StaticNode::Null) | |||
}); | |||
self.parse(field_value, field_type) | |||
self.parse(field_value, field_type).map(|d| d.to_owned_datum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the nexmark benchmark takes a struct as input, while it's not possible to achieve zero-copy when parsing a Struct
field (yet), the performance gain in this PR may not be reflected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need further investigation. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this idea makes sense, and varchar is a commonly used path, although not in benchmark. 👍
Also thanks for cleaning up useless lifetimes.
|
||
use super::{Datum, DatumRef, ToDatumRef, ToOwnedDatum}; | ||
|
||
/// 🐮 A borrowed [`DatumRef`] or an owned [`Datum`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain in doc why not std::borrow::Cow<Datum>
here?
.take(1024) | ||
.take(16384) | ||
.map(|(i, e)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To minimize the impact of other operations.
path: &[&str], | ||
type_expected: &DataType, | ||
) -> AccessResult<DatumCow<'a>> { | ||
self.access(path, type_expected).map(Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mutual-recursive default implementation is awkward. But it's OK to me considering the TODO, and this trait will hardly have new implementors. 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be soon cleaned up in #17165.
pub(crate) fn json_object_get_case_insensitive<'a, 'b>( | ||
v: &'b simd_json::BorrowedValue<'a>, | ||
key: &'b str, | ||
) -> Option<&'b BorrowedValue<'a>> { | ||
pub(crate) fn json_object_get_case_insensitive<'b>( | ||
v: &'b simd_json::BorrowedValue<'b>, | ||
key: &str, | ||
) -> Option<&'b BorrowedValue<'b>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
impl OpAction for OpActionUpdate { | ||
type Output = (Datum, Datum); | ||
type Output<'a> = (DatumCow<'a>, DatumCow<'a>); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpActionUpdate and do_update seems to be dead code
@@ -533,7 +539,7 @@ impl JsonParseOptions { | |||
} | |||
&BorrowedValue::Static(simd_json::StaticNode::Null) | |||
}); | |||
self.parse(field_value, field_type) | |||
self.parse(field_value, field_type).map(|d| d.to_owned_datum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's possible?
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
d6b28be
to
a05c149
Compare
Merge activity
|
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Add a new method named
access_cow
toAccess
trait, allowing implementors to return a borrowedDatumRef
, to achieve zero-copy parsing.Currently only
Varchar
field inJSON
returns a borrowedScalarRef
. This improves the performance by ~11% in micro-benchmarknexmark_integration
. Will adopt it more in following PRs.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.