Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the visibility of barrier collecting #16141

Open
4 tasks
fuyufjh opened this issue Apr 4, 2024 · 0 comments
Open
4 tasks

Improve the visibility of barrier collecting #16141

fuyufjh opened this issue Apr 4, 2024 · 0 comments
Assignees
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Apr 4, 2024

Yesterday I ran into a stuck related to LocalBarrierManager and LocalBarrierWorker, but I failed to get any information inside it. It's like a blackbox at all.

[RPC risingwave-compute-0.risingwave-compute:5688 - 2064]
/stream_service.StreamService/BarrierComplete [3843.568s]
  collect_barrier (epoch 6220276010844160) [!!! 3843.568s]

-- What's happenened inside ??? No actos were blocked at `6220276010844160`...

Since barrier collecting is quite important for streaming, I think we need a way to expose the state inside LocalBarrierWorker, for example, export the current collecting progress.

  • Add a InspectState event in LocalBarrierWorker to export the current barrier states
  • feat: support inspecting local barrier worker state #16562
  • (need-investigation) Export internal state in Meta node's BarrierManager
  • Output the results via an API endpoint & risectl command
  • (need-investigation) Add await tree dump for LocalBarrierWorker and Uploader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants