Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IndexerWorkerClient#fetchChannelData when response has data and error. #15084

Merged
merged 2 commits into from
Oct 9, 2023

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Oct 4, 2023

When a channel data response from a worker includes some data and then some I/O error, then when the call is retried, we will re-read the set of data that was read by the previous connection and add it to the local channel again. This causes the local channel to become corrupted. The patch fixes this case by skipping data that has already been read.

…rror.

When a channel data response from a worker includes some data and then
some I/O error, then when the call is retried, we will re-read the set
of data that was read by the previous connection and add it to the
local channel again. This causes the local channel to become corrupted.
The patch fixes this case by skipping data that has already been read.
@gianm gianm added Bug Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 4, 2023
@github-actions github-actions bot removed the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Oct 4, 2023
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

if (backpressureFuture != null) {
clientResponseObj.setBackpressureFuture(backpressureFuture);
}
clientResponseObj.addBytesRead(chunkSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would expect to add bytes read at the end before returning the clientResponseObject.
More of a readability thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the line before the return.

}
catch (Exception e) {
clientResponseObj.exceptionCaught(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: toSkip> chunk size since we have already have that data in the channel, so we basically do nothing. Can we add an explicit else and mention that in a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else would have en empty body, so instead of adding that, I just added a comment next to the else if.

if (backpressureFuture != null) {
clientResponseObj.setBackpressureFuture(backpressureFuture);
}
clientResponseObj.addBytesRead(chunkSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the skip case we are not basically reading all the bytes. We are skipping to chunkSize - (int) toSkip]. So does adding in all the bytes is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, since "bytes read" tracks how many bytes have been read from the response, not how many have been written to the channel. Its purpose is to let us know when to stop skipping (we keep incrementing it until it passes channel.getBytesAdded() - startOffset). I added a comment about this.

@cryptoe cryptoe merged commit c483cb8 into apache:master Oct 9, 2023
81 checks passed
@cryptoe
Copy link
Contributor

cryptoe commented Oct 9, 2023

Thanks @gianm for the clarifying comments. !!

@gianm gianm deleted the fix-ff-handler-retry branch October 10, 2023 06:36
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
…rror. (apache#15084)

* Fix IndexerWorkerClient#fetchChannelData when response has data and error.

When a channel data response from a worker includes some data and then
some I/O error, then when the call is retried, we will re-read the set
of data that was read by the previous connection and add it to the
local channel again. This causes the local channel to become corrupted.
The patch fixes this case by skipping data that has already been read.
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
…rror. (apache#15084)

* Fix IndexerWorkerClient#fetchChannelData when response has data and error.

When a channel data response from a worker includes some data and then
some I/O error, then when the call is retried, we will re-read the set
of data that was read by the previous connection and add it to the
local channel again. This causes the local channel to become corrupted.
The patch fixes this case by skipping data that has already been read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants