Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable conflict check for those row_id generated by rowIdGenerator #14796

Open
st1page opened this issue Jan 25, 2024 · 5 comments
Open

disable conflict check for those row_id generated by rowIdGenerator #14796

st1page opened this issue Jan 25, 2024 · 5 comments

Comments

@st1page
Copy link
Contributor

st1page commented Jan 25, 2024

We have find the conflict check can be the bottleneck in #14635, even if we have optimized it with bloom filter and operator cach.

For the AppendOnly table without primary key, we do not need conflict check because all the row_id is generated by rowIdGenerator.

dev=> explain create table t(v int) append only;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [v, _row_id(hidden)], stream_key: [_row_id], pk_columns: [_row_id], pk_conflict: NoCheck }
 └─StreamRowIdGen { row_id_index: 1 }
   └─StreamUnion { all: true }
     └─StreamExchange [no_shuffle] { dist: SomeShard }
       └─StreamDml { columns: [v, _row_id] }
         └─StreamSource
(6 rows)

when the table is non-append only, we still need to check for for the delete and update DML statment. But the the op rows are most from insert or the table's upstream connector like kafka. So if we find a way to determine those rows whose row_id is generated by rowIdGenerator, we can save the conflict check.

dev=> explain create table t(v int);
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [v, _row_id(hidden)], stream_key: [_row_id], pk_columns: [_row_id], pk_conflict: Overwrite }
 └─StreamRowIdGen { row_id_index: 1 }
   └─StreamUnion { all: true }
     └─StreamExchange { dist: HashShard(_row_id) }
       └─StreamDml { columns: [v, _row_id] }
         └─StreamSource
(6 rows)
@github-actions github-actions bot added this to the release-1.7 milestone Jan 25, 2024
@chenzl25
Copy link
Contributor

Is it possible that _row_id is deleted from DML?

@st1page
Copy link
Contributor Author

st1page commented Jan 26, 2024

Is it possible that _row_id is deleted from DML?

Yes, So we need some way to let MaterializeExecutor if the row needs to be checked (comes from Update/delete DML)

@st1page st1page modified the milestones: release-1.7, release-1.8 Mar 6, 2024
@st1page st1page modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@st1page st1page self-assigned this Apr 8, 2024
@lmatz
Copy link
Contributor

lmatz commented Apr 18, 2024

This only applies to the case where there is no downstream MV or index, right?

With MV or index, we have to do the checks for inserts anyways.

@st1page
Copy link
Contributor Author

st1page commented Apr 18, 2024

This only applies to the case where there is no downstream MV or index, right?

With MV or index, we have to do the checks for inserts anyways.

It can optimized for all tables without primary key, whether there are MVs downstream or not

@st1page st1page modified the milestones: release-1.9, release-1.10 May 14, 2024
@st1page st1page removed their assignment May 14, 2024
@st1page st1page modified the milestones: release-1.10, release-1.9 May 14, 2024
@fuyufjh fuyufjh removed this from the release-1.9 milestone May 14, 2024
Copy link
Contributor

github-actions bot commented Aug 1, 2024

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants