Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(meta): clarify the completeness of internal table catalogs #18944

Merged
merged 5 commits into from
Oct 21, 2024

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Oct 16, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

When creating a streaming job, the internal tables directly scraped from the frontend's StreamFragmentGraph are incomplete, because several fields like fragment_id or vnode_count are not filled yet and will be determined later.

Previously, we notified the frontend nodes with these catalogs but did not update them afterward. This will cause problems since variable vnode count support is introduced, as the vnode count of a table is a significant property when scheduling a batch scan over it.

This PR changes to notify the internal table catalogs only after a TableFragments is build, where all information is final and complete.

This PR revises the documentation, comments, and naming of related snippets to clarify the completeness of internal table catalogs during various phases of creating a streaming job.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Bugen Zhao <[email protected]>

only notify internal tables

Signed-off-by: Bugen Zhao <[email protected]>

add more comments

Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao force-pushed the bz/create-mv-only-notify-complete-catalogs branch from 0c7b344 to b365566 Compare October 18, 2024 02:55
@BugenZhao BugenZhao changed the title refactor(meta): only notify complete catalogs when creating mv fix(meta): notify complete internal table catalogs when creating mv Oct 18, 2024
@github-actions github-actions bot added the type/fix Bug fix label Oct 18, 2024
Signed-off-by: Bugen Zhao <[email protected]>
// so we need to notify the frontend to delete them here.
// The frontend will ignore the request if the object does not exist,
// so it's safe to always notify.
self.notify_frontend(Operation::Delete, build_relation_group_for_delete(objs))
Copy link
Member Author

@BugenZhao BugenZhao Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the notification now goes after successfully scheduling and building of the job, a failure here does not always mean that the notification was sent. That's why we changed all DROP handlers in the frontend to be DROP .. IF EXISTS.

An alternative could be introducing a new operation like Cleanup for this purposes, ensuring Delete remains strictly applied. But it seems too ad-hoc to me as there's no much other usage of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a race condition? Whereby we have Delete of some catalog followed by Add of that catalog?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not because try_abort_creating_streaming_job is called only after create_streaming_job_inner has returned with error, where Add is either pushed into the notification queue or not issued at all. After entering this try_abort function, no further Add operations will occur.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 We ensured that CREATE and DROP will be properly paired in #18476 .Even if the CREATE notification fails to send due to either a FE or meta node reboot or a broken connection between them, the creation of the mview and its internal table catalogs will still be synchronized through an initial snapshot.

@BugenZhao
Copy link
Member Author

Oops, I realized that we'll eventually notify the frontend nodes to complete the catalogs for both the job and the internal tables in finish_streaming_job.

// notify frontend: job, internal tables.
let internal_table_objs = Table::find()
.find_also_related(Object)
.filter(table::Column::BelongsToJobId.eq(job_id))
.all(&txn)
.await?;
let mut relations = internal_table_objs
.iter()
.map(|(table, obj)| PbRelation {
relation_info: Some(PbRelationInfo::Table(
ObjectModel(table.clone(), obj.clone().unwrap()).into(),
)),
})
.collect_vec();
let mut notification_op = NotificationOperation::Add;
match job_type {
ObjectType::Table => {
let (table, obj) = Table::find_by_id(job_id)
.find_also_related(Object)
.one(&txn)
.await?
.ok_or_else(|| MetaError::catalog_id_not_found("table", job_id))?;
if table.table_type == TableType::MaterializedView {
notification_op = NotificationOperation::Update;
}

So this PR is more like a refactor rather than a fix. Actually the main motivation of this PR is to enable #18976, where we don't expect the frontend nodes to receive a VnodeCount::Placeholder to avoid potential confusion.

However, I think it's also okay to bypass the check and allow VnodeCount::Placeholder if the table is in StreamJobStatus::Creating. This is because informing the frontends about these tables is primarily for reference in DROP statements, where querying them is not permitted. As a result, refactoring in this PR may not be necessary anymore.

Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao force-pushed the bz/create-mv-only-notify-complete-catalogs branch from a543ff2 to 5adb7fe Compare October 21, 2024 06:56
@BugenZhao BugenZhao changed the title fix(meta): notify complete internal table catalogs when creating mv refactor(meta): clarify the completeness of internal table catalogs Oct 21, 2024
@BugenZhao
Copy link
Member Author

As a result, refactoring in this PR may not be necessary anymore.

Have updated the PR to only

revises the documentation, comments, and naming of related snippets to clarify the completeness of internal table catalogs during various phases of creating a streaming job.

@@ -1643,6 +1645,7 @@ impl DdlController {
table_parallelism,
max_parallelism.get(),
);
let internal_tables = table_fragments.internal_tables();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal_tables field in CreateStreamingJobContext will now be complete.

@BugenZhao BugenZhao added this pull request to the merge queue Oct 21, 2024
Merged via the queue into main with commit 9bbf418 Oct 21, 2024
36 of 37 checks passed
@BugenZhao BugenZhao deleted the bz/create-mv-only-notify-complete-catalogs branch October 21, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants