Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: detect cycle ref in proto #10499

Merged
merged 6 commits into from
Jun 26, 2023
Merged

fix: detect cycle ref in proto #10499

merged 6 commits into from
Jun 26, 2023

Conversation

tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Jun 23, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #10475

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

Documentation

  • My PR contains user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@github-actions github-actions bot added the type/fix Bug fix label Jun 23, 2023
Signed-off-by: tabVersion <[email protected]>
@tabVersion tabVersion marked this pull request as ready for review June 23, 2023 09:32
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Merging #10499 (d439de1) into main (e21226a) will increase coverage by 0.00%.
The diff coverage is 97.50%.

@@           Coverage Diff           @@
##             main   #10499   +/-   ##
=======================================
  Coverage   70.33%   70.34%           
=======================================
  Files        1274     1274           
  Lines      218988   219023   +35     
=======================================
+ Hits       154034   154066   +32     
- Misses      64954    64957    +3     
Flag Coverage Δ
rust 70.34% <97.50%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/connector/src/parser/protobuf/parser.rs 59.33% <97.50%> (+4.08%) ⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/connector/src/parser/protobuf/parser.rs Outdated Show resolved Hide resolved
Comment on lines +16 to +22
message Parent {
string parent_name = 1;
int32 parent_id = 2;
repeated ComplexRecursiveMessage siblings = 3;
}

Parent parent = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a practical example? Seems we're storing the nodes multiple times since protobuf is by-value instead of by-ref. 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the example is generated by chatgpt. real world ones are welcomed

Comment on lines +2 to +6
�
recursive.proto recursive"�
ComplexRecursiveMessage
node_name ( RnodeName
node_id (RnodeIdM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the compiled result? Might be better to git ignore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proto dep requires the compiled ver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean can we compile it when testing? Seem we don't have any other test that depends on the hard coded compile output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, not a big deal. We can always handle it later.

Copy link
Contributor Author

@tabVersion tabVersion Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean can we compile it when testing? Seem we don't have any other test that depends on the hard coded compile output.

There are only a few generated protobuf binary, I think it is not worthy introducing extra scripts and toolchain.

Comment on lines +228 to +241
fn detect_loop_and_push(trace: &mut Vec<String>, fd: &FieldDescriptor) -> Result<()> {
let identifier = format!("{}({})", fd.name(), fd.full_name());
if trace.iter().any(|s| s == identifier.as_str()) {
return Err(RwError::from(ProtocolError(format!(
"circular reference detected: {}, conflict with {}, kind {:?}",
trace.iter().join("->"),
identifier,
fd.kind(),
))));
}
trace.push(identifier);
Ok(())
}

Copy link
Contributor

@neverchanje neverchanje Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool to use an O(N^2) algorithm in this case. But I am worrying if it's possible for users to import a 1k-fields message in the system, which may cause the memory usage suddenly increases.

Given that all the previous types before the current DFS point should be unique, maybe you can use a set instead of a vector:

fn detect_loop_and_push(visited: &mut HashSet<String>, fd: &FieldDescriptor) -> Result<()> {
  if visited.contains(fd.name()) {
     return Err(...)
  }
  visited.insert(fd.name());
}

fn protobuf_type_mapping(
    field_descriptor: &FieldDescriptor,
    parse_trace: &mut HashSet<String>,
) -> Result<DataType> {
    detect_loop_and_push(parse_trace, field_descriptor)?;

    ...

    parse_trace.erase(field_descriptor.name())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use a vec displaying the parser visiting path in order to help to identify the recursive ref occurs at which level.
push and pop op always happen at the end of the vec, there should be little overhead for allocation

Copy link
Contributor

@neverchanje neverchanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@tabVersion tabVersion added this pull request to the merge queue Jun 26, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 26, 2023
@tabVersion tabVersion added this pull request to the merge queue Jun 26, 2023
Merged via the queue into main with commit 21da7f9 Jun 26, 2023
@tabVersion tabVersion deleted the tab/pb-resolve-path branch June 26, 2023 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ban recursive proto messages from protobuf source
5 participants