Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix the panic during parallel unit mapping conversion. #17318

Conversation

shanicky
Copy link
Contributor

@shanicky shanicky commented Jun 18, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Title:

Refactor consistent hashing and improve error handling in metadata controllers

Summary:

This pull request introduces a series of updates to Rust files associated with consistent hashing and metadata controllers in our distributed system. Below is an overview of the main changes incorporated into the system:

Changes to consistent_hash/masking.rs:

  • Introduced a new HashSet import from std::collections
  • Added a new Sub import from std::ops
  • Implemented a new error type ParallelUnitError to handle un-mapped parallel units in the system
  • Updated ParallelUnitMapping::to_worker_slot to return a Result type, now checking complete coverage of parallel unit IDs by worker slots and returning an error if uncovered
  • Added method as_delete_worker_slap_mapping to ParallelUnitMapping for creating a WorkerSlotMapping indicating delete operations via max u64 values

Modifications to catalog.rs:

  • Removed an unused ParallelUnitMapping import
  • Altered methods (drop_database, deploy_stream_job, stop_stream_job) to set the worker slot mapping to None in PbFragmentWorkerSlotMapping instead of computing a mapping indicating a shift in mapping handling
  • Extricated the obsolete get_parallel_unit_to_worker_map function from several methods

Updates to fragment.rs and streaming_job.rs:

  • Adapted both files to account for ParallelUnitMapping::to_worker_slot now returning a Result, enabling proper error propagation
  • Created a helper function convert_fragment_mappings in fragment',' which converts fragment mappings to protobuf representations while handling errors from to_worker_slot`

The overarching goal of these changes is to streamline and enhance error handling for converting parallel unit mappings to worker slot mappings, which is a pivotal aspect of our metadata controllers. Affected code segments now anticipate potential conversion errors and are equipped for their management, markedly improving system robustness in handling parallel unit mapping scenarios.

Note: Reviewers should carefully assess the potential impacts of setting the worker slot mappings to None and confirm that the propagation of conversion errors aligns with the intended behavior of the distributed system architecture.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

@shanicky shanicky marked this pull request as ready for review June 19, 2024 07:18
@graphite-app graphite-app bot requested a review from a team June 19, 2024 07:18
Shanicky Chen added 2 commits June 19, 2024 15:18
@shanicky shanicky requested a review from yezizp2012 June 20, 2024 07:05
@shanicky shanicky added need-cherry-pick-release-1.9 need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged labels Jun 20, 2024
@shanicky shanicky enabled auto-merge June 20, 2024 07:20
…consistent_hashmappingrs26453-no-entry-found-for-key
@yezizp2012 yezizp2012 requested a review from BugenZhao June 20, 2024 07:32
@shanicky shanicky added this pull request to the merge queue Jun 20, 2024
Merged via the queue into main with commit 7df77a4 Jun 20, 2024
31 checks passed
@shanicky shanicky deleted the 17288-meta-node-panicked-at-srccommonsrchashconsistent_hashmappingrs26453-no-entry-found-for-key branch June 20, 2024 08:19
github-actions bot pushed a commit that referenced this pull request Jun 20, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/run-e2e-single-node-tests need-cherry-pick-release-1.10 Open a cherry-pick PR to branch release-1.10 after the current PR is merged type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

meta node panicked at src/common/src/hash/consistent_hash/mapping.rs:264:53: no entry found for key
2 participants