-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support data limit when reading a batch with TopicReaderSync #431
base: main
Are you sure you want to change the base?
Support data limit when reading a batch with TopicReaderSync #431
Conversation
… a sync topic reader
max_messages: typing.Union[int, None] = None, | ||
max_bytes: typing.Union[int, None] = None, | ||
) -> Union[PublicBatch, None]: | ||
all_amount = float("inf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need all_amount as float const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is that by default we have no limits on a data flow.
I'm not sure if UInt64
max value is sufficient enough, therefore, I chose infinity, which happens to be a float
.
max_bytes = all_amount | ||
|
||
is_batch_set = batch is not None | ||
is_msg_limit_set = max_messages < all_amount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need all_amount instead check max_messages is not None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because max_messages
is being set to all_amount
(up above) in case if it hasn't been provided (e.g. it's None
).
@@ -86,6 +86,57 @@ def async_wait_message(self) -> concurrent.futures.Future: | |||
|
|||
return self._caller.unsafe_call_with_future(self._async_reader.wait_message()) | |||
|
|||
def _make_batch_slice( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMPORTANT
After applying the function, a caller will lose messages that have been trimmed from the batch and will not see these messages in the read session. A server does not allow to skip messages during commit. This can cause problems:
- If the caller commits messages with ack, the software will hang up forever (because the server will wait for skipped messages before ack the commit).
- If the caller commits messages without ack. After reconnecting all messages after the last successfully commit (first batch with cut messages) will be re-read. A log of extra work is required to re-read these messages and real progress will be very slow.
- If the progress is saved on the user's side and messages are not committed to the SDK, the will be lost and cannot be recovered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see, though I'm not quite sure I fully understand the path to a solution.
What was the expected approach to take?
Allow a client to control the amount of data it receives, when reading a batch through
TopicReaderSync
.Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
TopicReaderSync.receive_batch
ignoresmax_messages
andmax_bytes
parameters, which means a client has no control over the amount of received data.Issue Number: 365
What is the new behavior?
TopicReaderSync.receive_batch
now takesmax_messages
andmax_bytes
into account.Other information
Decisions made:
_commit_get_partition_session
, and rather copy_partition_session
from the batch to a new (sliced) batch.max_messages
, normax_bytes
were provided.