Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error messages for ridiculous deadlines #632

Open
klnusbaum opened this issue Jul 11, 2017 · 9 comments
Open

Better error messages for ridiculous deadlines #632

klnusbaum opened this issue Jul 11, 2017 · 9 comments

Comments

@klnusbaum
Copy link

About once a week, someone will report that our service (which uses tchannel-go) is giving them timeouts and they'll show us an error that looks something like " tchannel error ErrCodeTimeout". We investigate, and 9 times out of 10 the actually problem is that they made an RPC request with an extrodinarly short deadline (sub-10ms) or, in some cases, they try to make a request where the deadline is already expired.

I propose that we return a better error message from tchannel when a request is made with a deadline that is clearly unreasonable.

@willhug
Copy link

willhug commented Jul 11, 2017

Can we define "clearly unreasonable"? A 10ms deadline might be unreasonable for some services, but a hard requirement for others.

We should probably just add the ability to distinguish between client timeouts and server timeouts (and maybe add information about how much time a request was given). We could also short circuit requests that have already passed their deadline and provide a more specific error in those cases.

@klnusbaum
Copy link
Author

Looks like there's already a test for this in outbound.go that sends the "reasonable" deadline limit at 1 ms (since that actually get's encoded as 0 on the wire). I'll put a PR with some thoughts on how we might solve the problem of having a more descriptive error message.

@kriskowal
Copy link
Contributor

@willhug, @akshayjshah, and I have discussed (in the context of YARPC retries) the ability to configure a "min" and "max" threshold for the timeout on a request. The "max" would be used to truncate the deadline to now+max, and drop a request if now+min was past the context deadline. The idea would be to shed work off the wire for requests that are clearly impossible, as an optimization.

Would this "minimum necessary time to respond" suffice to convey the meaning of "clearly unreasonable"?

@kriskowal
Copy link
Contributor

…if that is the case, TChannel would respond with use a Busy error instead of sending or handling a request, with a message indicating that there was not enough time on the deadline.

@klnusbaum
Copy link
Author

@kriskowal so in that case, we'd want to make the change at the YARPC level?

@prashantv
Copy link
Contributor

I don't know if we should create a new error code (since that would require all other libraries to be updated to recognize the error, and I'm not aware of how older clients would handle this error code).

However, Busy doesn't seem like the right error code, Declined might be better.

I think it makes sense to add a minimum threshold directly in TChannel, so we can do the least amount of work as soon as we get the frame, avoid creating a context, and avoid timing out while reading the method name.

I'd imagine a new option that's defined as part of channel creation which we'd then compare against in handleCallReq.

@klnusbaum
Copy link
Author

@prashantv So if I'm understanding correctly, you'd like to see something that:

  1. Adds new code such that during channel creation time so minimum threshold can be specified for how small a request Deadline can be
  2. Add some code to handleCalReq that then checked this threshold for every request. If the Deadline for a request is too small, return a Declined error.

Does that sound correct? Can we have a default minimum threshold?

@prashantv
Copy link
Contributor

  1. Yep, that's right, a new threshold in ChannelOptions that defaults to 0 which essentially means no minimum deadline (which is backwards compatible and matches existing behaviour).

  2. Yep, exactly.

I don't think TChannel itself should have a default minimum threshold, we can set one in our internal wrapper library, although there are services that can respond within a few milliseconds, so the only reasonable value we can probably set is 1ms. Services that need a lower value should be able to customize it.

@klnusbaum
Copy link
Author

Alright, I'll take a stab at this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants