Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: the actor information has not been broadcasted to the affected CNs #14904

Closed
yezizp2012 opened this issue Jan 31, 2024 · 1 comment
Closed
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@yezizp2012
Copy link
Member

yezizp2012 commented Jan 31, 2024

Describe the bug

ERROR:  Failed to run the query

Caused by these errors (recent errors listed first):
1: gRPC request to meta service failed: Internal error
2: gRPC request to stream service failed: Internal error
3: actor 105 not found in info table

Reported in one of our customer and discovered by @shanicky , this is a corner for creating streaming jobs. It only appears under the following three situations:

  1. the streaming job is create sink into table.
  2. the sink is performing temporary joins on multiple tables.
  3. The set of CNs that all joined tables distributed on is a subset of the set of CNs that the target table distributed on. In other words, some actors in the target table are distributed on CNs that are not present in the join tables.

Currently we only broadcast the info of new actors and all upstream actors to compute nodes that only have some building actors during creation of streaming job.

let worker_node = building_locations.worker_locations.get(worker_id).unwrap();
let actor_infos_to_broadcast = &actor_infos_to_broadcast;
async move {
let client = self.env.stream_client_pool().get(worker_node).await?;
client
.broadcast_actor_info_table(BroadcastActorInfoTableRequest {
info: actor_infos_to_broadcast.clone(),
})
.await?;

It works for streaming jobs except for sink into table, because we broadcast actor infos of the sink and new target table jobs separately. Under the situation mentioned above, Some actors of the sink on some CNs do not know the information of the upstream sink actors when building.

self.build_actors(&table_fragments, &building_locations, &existing_locations)
.await?;
if let Some((_, context, table_fragments)) = replace_table_job_info {
let MetadataManager::V1(mgr) = &self.metadata_manager else {
unimplemented!("support create sink into table in v2");
};
self.build_actors(
&table_fragments,
&context.building_locations,
&context.existing_locations,
)
.await?;

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

No response

Additional context

No response

@yezizp2012 yezizp2012 added the type/bug Something isn't working label Jan 31, 2024
@github-actions github-actions bot added this to the release-1.7 milestone Jan 31, 2024
@yezizp2012 yezizp2012 changed the title bug: The actor information has not been broadcasted to the affected CN bug: the actor information has not been broadcasted to the affected CNs Jan 31, 2024
@yezizp2012 yezizp2012 modified the milestones: release-1.7, release-1.8 Mar 6, 2024
@yezizp2012 yezizp2012 modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@shanicky shanicky modified the milestones: release-1.11, release-1.10 Jul 10, 2024
@yezizp2012
Copy link
Member Author

Fixed by #18270 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants