Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc request may hang when error message is too large after bumping to tonic v0.12 #18039

Closed
BugenZhao opened this issue Aug 14, 2024 · 3 comments · Fixed by #19752
Closed

grpc request may hang when error message is too large after bumping to tonic v0.12 #18039

BugenZhao opened this issue Aug 14, 2024 · 3 comments · Fixed by #19752
Assignees
Labels
no-issue-activity type/bug Something isn't working

Comments

@BugenZhao
Copy link
Member

e2e-sink-test now consistently hangs here:

statement error test-rw-sink-upsert-avro-err-key
create sink sink_err from into_kafka with (
connector = 'kafka',
topic = 'test-rw-sink-upsert-avro-err',
properties.bootstrap.server = 'message_queue:29092',
primary_key = 'int32_field,string_field')
format upsert encode avro (
schema.registry = 'http://schemaregistry:8082');

I find that by disabling SCHEMA_REGISTRY_DEBUG here, this issue is gone.

SCHEMA_REGISTRY_DEBUG: 'true'

The only difference is that there won't be backtraces from the schema registry in the error message.

failed to validate sink: config error: all request confluent registry all timeout, req path ["subjects", "test-rw-sink-upsert-avro-err-key", "versions", "latest"], urls http://schemaregistry:8082/
	confluent schema registry error 40401: Subject 'test-rw-sink-upsert-avro-err-key' not found. io.confluent.rest.exceptions.RestNotFoundException: Subject 'test-rw-sink-upsert-avro-err-key' not found.
- io.confluent.rest.exceptions.RestNotFoundException: Subject 'test-rw-sink-upsert-avro-err-key' not found.
-	at io.confluent.kafka.schemaregistry.rest.exceptions.Errors.subjectNotFoundException(Errors.java:78)
-	at io.confluent.kafka.schemaregistry.rest.resources.SubjectVersionsResource.getSchemaByVersion(SubjectVersionsResource.java:154)
-	at jdk.internal.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)

Following this insight, I suppose it's because we always encode the ServerError in gRPC (HTTP2) headers (#13282), and there's an outstanding issue where tonic 0.12 will hang forever when the header size exceeds some limit.

let serialized = bincode::serialize(&source).unwrap();
let mut metadata = MetadataMap::new();
metadata.insert_bin(ERROR_KEY, MetadataValue::from_bytes(&serialized));

Upstream issues:

ATM there seems to be no fix. I'll disable SCHEMA_REGISTRY_DEBUG now as a workaround and open an issue for this.

Originally posted by @BugenZhao in #17889 (comment)

@BugenZhao BugenZhao added the type/bug Something isn't working label Aug 14, 2024
@github-actions github-actions bot added this to the release-2.0 milestone Aug 14, 2024
@BugenZhao
Copy link
Member Author

With this configuration exposed, we're able to workaround this issue:

hyperium/tonic#1835

Waiting for a new version to be released.

@BugenZhao
Copy link
Member Author

Workarounded with #18639

@BugenZhao BugenZhao removed this from the release-2.1 milestone Sep 30, 2024
Copy link
Contributor

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-issue-activity type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant