Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gRPC): optimize gRPC error handling #1549

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

DMwangnima
Copy link
Contributor

@DMwangnima DMwangnima commented Sep 13, 2024

What type of PR is this?

feat

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:

  • Optimising the gRPC client side call to streaming.Send to return ```the stream is done`` now allows you to get the real reason why the stream is being shut down
  • Optimise the error returned when a connection is closed on the gRPC client/server side by uniformly converting ConnectionError to code = 14, msg = ConnectionError.Desc status
  • Optimise the error returned by gRPC server-side calls to streaming.Recv/Send, so that when the stream is closed we can get the real reason instead of "context is canceled".
  • For the streaming call string scenario A -> B -> C, an error between A and B causes the ctx to be cancelled, making the call between B and C go wrong. Add ‘[triggered by handler side]’, '[triggered by remote service]' to the error string to better alert the user.
  • Add log in the scenario where an RstStream Frame is sent under an exception on the gRPC server side.

zh(optional):

  • kerrors 定义用户可以感知到的 Streaming Error
  • gerrors 定义 gRPC 相关的协议错误,统一继承 kerrors.ErrStreamingProtocol
  • gRPC 支持注册 HTTP2 ErrCode 与 kerrors/gerrors 错误之间的映射,让 RstStream Frame 也能传递更为详细的信息
  • gRPC Status Error 统一携带 kerrors/gerrors 中的错误,用于 errors.Is 感知与内部错误码扩展
  • 优化 gRPC client 侧调用 streaming.Send 返回the stream is done,现在可以拿到流被关闭的真正原因
  • 优化 gRPC client/server 侧连接被关闭时返回的错误,统一将ConnectionError转化为code = 14, msg = ConnectionError.Desc的 status
  • 优化 gRPC server 侧调用 streaming.Recv/Send 返回的错误,当流被关闭时可以获取到真正的原因而不是"context is canceled"
  • 对于流式调用串联场景 A -> B -> C,A B 间的错误让 ctx 被 cancel,使得 B C 间的调用出错。给错误字符串加上 "[triggered by handler side]"、"[triggered by remote service]" 来更好地提示用户
  • 为 gRPC 异常情况下发送/接收 RstStream Frame 的场景加上日志

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

@DMwangnima DMwangnima requested review from a team as code owners September 13, 2024 09:00
@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 2 times, most recently from 20064bb to ed1b013 Compare September 16, 2024 06:26
@DMwangnima DMwangnima changed the title feat: optimize gRPC server side error handling feat: optimize gRPC error handling Sep 16, 2024
@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 3 times, most recently from 012de39 to 8683021 Compare September 16, 2024 08:20
// Error wraps a pointer of a status proto. It implements error and Status,
// and a nil *Error should never be returned by this package.
type Error struct {
e *spb.Status
e *spb.Status
triggeredByUpstream bool
Copy link
Collaborator

@xiaost xiaost Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有点丑,我们要不要通过 code 编码来区别呀?另外streaming里没有什么upstream的概念。。。代码里可以都叫 remote err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更好的办法确实是 code 编码,但是对于 gRPC-python 之类的服务可能不认。
叫 upstream 主要是因为 A -> B -> C,A B 出错导致 B C 调用失败,不提示 upstream 的话很容易以为是 B C 间的问题。。

Copy link
Contributor Author

@DMwangnima DMwangnima Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除了,现在 http2Server 流/连接被关闭抛出的错误都会带上"[triggered by {remote service}]"后缀

@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 4 times, most recently from 62e6e92 to ba48fa1 Compare September 18, 2024 13:22
@DMwangnima DMwangnima changed the title feat: optimize gRPC error handling feat(gRPC): optimize gRPC error handling Sep 19, 2024
@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 2 times, most recently from 3087213 to 94687b7 Compare September 20, 2024 08:17
@@ -451,19 +451,23 @@ func (c *controlBuffer) get(block bool) (interface{}, error) {
select {
case <-c.ch:
case <-c.done:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done是外部传入的,如果close了的吧,而c.err没数据,会导致返回 nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.finish(errStatusConnClosing) 已经注入了 err 了(在 c.err 没数据的情况下)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish是不是直接return err呀,这样外面不用老是加锁再取一次。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

sendRSTStreamFrameSuffix = "[send RST Stream Frame]"
)

func remoteErrMsg(msg string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些方法实际上也没有多大的意义。。。感觉不如直接写到klog.CtxErrorf,一般用于日志的话,不要把格式化放在外层做。因为如果关闭了日志,这些逻辑都不应该被执行。

t.controlBuf.put(&cleanupStream{
streamID: streamID,
rst: true,
rstCode: http2.ErrCodeRefusedStream,
onWrite: func() {},
})
s.cancel(errMaxStreamsExceeded)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个变更是为啥?

return errStatusStreamDone
}

func (s *Stream) SetSourceService(svc string) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是公开的方法,不应该这样直接放出来。

@@ -185,6 +185,7 @@ func (t *svrTransHandler) handleFunc(s *grpcTransport.Stream, svrTrans *SvrTrans
return
}
}
s.SetSourceService(ri.From().ServiceName())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是用一个更通用的方式去存储这类数据吧,不然后面要加一堆的公开接口。

@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 3 times, most recently from 7688c77 to 67c9a2d Compare September 27, 2024 01:47
@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 3 times, most recently from 963f2d1 to c7bb7cd Compare November 11, 2024 03:08
@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 8 times, most recently from 9bd9abe to 95da68d Compare November 14, 2024 01:42
@@ -0,0 +1,62 @@
/*
Copy link
Contributor Author

@DMwangnima DMwangnima Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

外部统一使用 streamx 下暴露的定义

// Error wraps a pointer of a status proto. It implements error and Status,
// and a nil *Error should never be returned by this package.
type Error struct {
e *spb.Status
// kerr is the Kitex custom error that status maps to
kerr error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

携带 kerrors 和 gerrors 中定义的错误,用于 errors.Is 错误处理与内部错误码上报

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个预计会怎么用? fmt.Errorf %w + errors.As 的方式不就够了么,为什么还要封装?

// e.g. RegisterCustomRstCode(1000, kerrors.ErrGracefulShutdown)
// When Kitex receives a RST_STREAM frame with error code 1000, it will inject kerrors.ErrGracefulShutdown into
// *status.Error that users will conceive
func RegisterCustomRstCode(rstCode uint32, mappingErr error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注册 HTTP2 ErrCode 与 kerrors/gerrors 错误之间的映射,使得 RstStream Frame 可以传递信息

)

// SetServiceMeta is used to inject service-related metadata
func (s *Stream) SetServiceMeta(key svcMetaKey, val interface{}) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法到底是对内还是对外的?对内的话,不用暴露出来,对外的话,interface里也没有,保证不了用户套了一层,我们用不了。另外key这类,不要用string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方法是对内的,但是不暴露的话,没法在 nphttp2/server_handler.go 中直接使用。
用户即使套了也没有意义,执行 SetServiceMeta 的时候用户还没法感知 Stream

@@ -158,29 +160,54 @@ func (s *Status) Details() []interface{} {
return details
}

// WithMappingErr creates a new Status and injects Kitex mapping err
func (s *Status) WithMappingErr(kerr error) *Status {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个预计会怎么用? fmt.Errorf %w + errors.As 的方式不就够了么,为什么还要封装?

)

// IsStreamingError reports whether the given err is a streaming err
func IsStreamingError(err error) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个如果在热路径上判断会比较低效,会随着错误变多,每个都要判断一次。加个 streamingError 类型?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// This package contains all the errors suitable for Kitex errors model.
// These errors should not be used by user directly.
// If users need to perceive these errors, the pkg/streamx/provider/grpc/gerrors package should be used.
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果用户不需要用的话,就不用定义到这里。哪里返回的话,定义到哪里。反正也不会拿来做逻辑,只是判断、日志,不需要export

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

内部需要拿到这个类型获取相应错误码,如果不暴露的话,得修改 *status.Error 的 Code() 方法返回值,或者加个 TypeID() 方法,内部错误码定义放在开源库里。

basic error
}

func newErrType(message string) *errType {
Copy link
Collaborator

@xiaost xiaost Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个实现成 fmt.Errorf("%w - %s", kerrors.ErrStreamingProtocol, message) 好像跟你的一样。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是考虑有可能会对 errType 做扩展,用 fmt.Errorf 确实最快

@DMwangnima DMwangnima force-pushed the feat/new_grpc_error_handling branch 2 times, most recently from be89793 to 3cc884a Compare November 15, 2024 08:17
joway
joway previously approved these changes Nov 19, 2024
pkg/remote/trans/nphttp2/grpc/http_util.go Outdated Show resolved Hide resolved
joway
joway previously approved these changes Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants