Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: remove "pk_indices" in stream plan and executor #4734

Open
st1page opened this issue Aug 18, 2022 · 7 comments
Open

RFC: remove "pk_indices" in stream plan and executor #4734

st1page opened this issue Aug 18, 2022 · 7 comments
Assignees

Comments

@st1page
Copy link
Contributor

st1page commented Aug 18, 2022

different from the schema, not every stream executor needs the pk_indices or stream_key. so propose to remove it, instead add fields like input_stream_key or l_stream_key r_stream_key in the corresponding stream plan node body.

@yuhao-su
Copy link
Contributor

yuhao-su commented Aug 18, 2022

not every stream executor needs the pk_indices or stream_key

Could you elaborate more?

@st1page
Copy link
Contributor Author

st1page commented Aug 18, 2022

Could you elaborate more?

  • actually only executor with state need it such as HashJoin/HashAgg/TopN/Chain need the input_stream_key
  • Another potential benefit is when we introduce new stateTable interface, executor can get a memcomparable key of each row by state table, so it do need to be aware of the input stream key.

@st1page
Copy link
Contributor Author

st1page commented Aug 19, 2022

Another potential benefit is when we introduce new stateTable interface, executor can get a memcomparable key of each row by state table, so it do need to be aware of the input stream key.

we can try to reduce the pk_indices in each stream operator and then do this issue.

@BugenZhao
Copy link
Member

We may also remove the pk_indices field in ExecutorInfo but add some more useful fields like executor_id.

https://github.com/singularity-data/risingwave/blob/ddd797f8184a96447a9d65c6d6aefe7bd5622ab2/src/stream/src/executor/mod.rs#L127-L138

@fuyufjh fuyufjh added this to the release-0.1.13 milestone Sep 5, 2022
@st1page st1page removed this from the release-1.1 milestone Aug 8, 2023
@st1page st1page assigned stdrc and st1page and unassigned st1page Dec 4, 2023
@st1page
Copy link
Contributor Author

st1page commented Dec 4, 2023

@stdrc could you check if it can be closed when you're free? thanks.

@stdrc
Copy link
Member

stdrc commented Dec 4, 2023

@stdrc could you check if it can be closed when you're free? thanks.

It seems that pk_indices may be able to be removed from ExecutorInfo, but some executor still need pk_indices of its input nodes, so I guess we still need it in stream plan.

I'll check more details later.

@stdrc
Copy link
Member

stdrc commented Feb 26, 2024

After extracting ExecutorInfo out of XxxExecutor (#15167), the idea proposed in this PR is still valid. It's still true that most executors don't access their input nodes' pk_indices (and also don't access its inputs' & themselves' schema).

For pk_indices, we cannot easily remove it from plan node for backward compatibility reason. For schema, I found it very hard to decide whether we should derive schema again in backend or we should force using the schema derived by frontend. It depends a lot on the nature of executor, which differs from each other.

Maybe we should still keep it as it is, until we have an even better understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants