-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(kinesis): rename fields of KinesisOffset
and KinesisSplit
to make everything explicit
#18704
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
KinesisOffset
and KinesisSplit
tp make everything explicitKinesisOffset
and KinesisSplit
to make everything explicit
next_offset: KinesisOffset::None, | ||
end_offset: KinesisOffset::None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can see from
risingwave/src/connector/src/source/kinesis/source/reader.rs
Lines 255 to 259 in 71732b1
KinesisOffset::SequenceNumber(seq) => ( | |
Some(seq.clone()), | |
None, | |
ShardIteratorType::AfterSequenceNumber, | |
), |
if kinesis read from a position, in current impl, it should begin with the next one.
So next_offset
does not mean next offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AfterSequenceNumber
is a way to denote the next one after the given sequence number. I don't see any problem here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try read this in English: The next_offset
is the offset AfterSequenceNumber("abc")
.
And compare it with previous one: The start_position
is the offset SequenceNumber("abc")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you want to eliminate the concept misalignment in the sources' split def, it is good for the project.
The variable name ought to describe the value it stores, not involving how other components do with it. But here, it stores the latest successful read offset (and literally the start offset if begin with Timestamp or TrimHorizen). It is not the next offset.
Kinesis does have its own primitive to begin consumption with the offset after the given one, your proposal may not take this for granted.
Try read this in English
Besides this, I know English well and please remain calm and friendly in communications. Be professional, dude.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try read this in English
Besides this, I know English well and please remain calm and friendly in communications. Be professional, dude.
I'm saying that the new semantics is understandable in English, while the previous one is not.
The value stored by start_position: KinesisOffset
is not the sequence number, it's After(sequence number)
. When recovery happens, the start position is the next one after sequence number
, so what start_position = SequenceNumber(abc)
actually means is start_position = AfterSequenceNumber(abc)
. The renaming from start_position
to next_offset
doesn't change any semantics, it doesn't matter. The core change is start_position = SequenceNumber(abc)
to start_position = AfterSequenceNumber(abc)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your logic, if start_position
is last seen offset, then the Earliest
and Latest
variants should be BeforeEarliest
, BeforeLatest
. Is it clear enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your logic, if start_position is last seen offset, then the Earliest and Latest variants should be BeforeEarliest, BeforeLatest. Is it clear enough?
Using sarcasm and rhetorical questions in the workplace is not a friendly communication attitude.
Is it clear enough?
Yes, I fully understand your motivation and proposed changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your logic, if start_position is last seen offset, then the Earliest and Latest variants should be BeforeEarliest, BeforeLatest. Is it clear enough?
Using sarcasm and rhetorical questions in the workplace is not a friendly communication attitude.
Hah? I'm not using any sarcasm and rhetorical question. "Is it clear enough?" just means "这么解释足够清楚了吗?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't be so sensitive. I'm just trying to explain. I admit that I was a bit impatient at the very beginning of this thread. But I didn't mean to be any aggressive after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I fully understand your motivation and proposed changes.
Ok, then I think we have reached a consensus.
Signed-off-by: Richard Chien <[email protected]>
Signed-off-by: Richard Chien <[email protected]>
72f8763
to
c3961b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Didn't read the discussion above) The change lgtm
Do I understand correctly:
- Previous
start_position
looks obstacle (the "start" word). It means how other code should use it, but seems wrong. - If we want to make it more consistent with "store" site, it should be renamed to
last_seen_offset
. (But seems not OK, since we have "timestamp"/"latest", etc) - If we want to make it more consistent with "read" site, call it
next_offset = After(offset)
looks cool.
Wait, don't So the real great invention to me is to rename the enum variant to |
Exactly. As I said #18704 (comment), renaming from |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
As part of #18697, this PR revisits the kinesis source connector and makes the offset semantics explicit in
KinesisOffset
andKinesisSplit
. Backward-compatibility is achieved by#[serde(alias = "...")]
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.