Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP streaming - where am I going wrong? or is there a bug? #6146

Closed
duglin opened this issue Dec 4, 2019 · 43 comments
Closed

HTTP streaming - where am I going wrong? or is there a bug? #6146

duglin opened this issue Dec 4, 2019 · 43 comments
Labels
area/networking kind/question Further information is requested kind/spec Discussion of how a feature should be exposed to customers. triage/needs-user-input Issues which are waiting on a response from the reporter

Comments

@duglin
Copy link

duglin commented Dec 4, 2019

In what area(s)?

/area API
/area networking

Ask your question here:

Warning... my code could be totally wrong :-)

See: https://github.com/duglin/knstream for a sample http streaming demo I was working on. In general the code does appear to work outside of Knative. Basically, build it using make stream client and then run ./stream in one window and ./client http://localhost:8080 in another. The client will send 10 bytes, then the server will echo it back - over and over.

If you then modify Makefile so that APP_IMAGE points to your dockerhub namespace, you can then run make test and it should deploy the server to knative and make the client try to talk to it.

For me I get a 502 Bad Gateway error. In debugging, I do see the queue proxy logs show:

2019/12/04 00:46:41 http: proxy error: unexpected EOF
2019/12/04 00:46:50 http: proxy error: read tcp 127.0.0.1:44930->127.0.0.1:8080: read: connection reset by peer

but no error, or even ack of connection, in the user-container. So it's not clear to me if the reset by peer is due to the Istio gateway or the user-container, I suspect it's not the user-container because I don't see a Got a connection println in it's logs.

In the Istio ingress gateway logs, I do see this:

[2019-12-04T00:46:50.287Z] "POST / HTTP/1.1" 502 - "-" "-" 0 0 7 6 "10.186.236.79,172.30.250.92" "-" "a63c1c23-c195-46c8-9d90-79d169f75a2e" "stream-default.kndemo.us-south.containers.appdomain.cloud" "172.30.110.239:8013" outbound|81||stream-hx7pp.default.svc.cluster.local - 172.30.107.154:80 172.30.250.92:38395 - -
[2019-12-04 01:13:48.689][18][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13,

But notice the time diff. (side note: it's odd that the timestamp formats differ)

Should this work or is my code not doing it correctly?

/cc @tcnghia @tanzeeb

@duglin duglin added the kind/question Further information is requested label Dec 4, 2019
@knative-prow-robot knative-prow-robot added area/API API objects and controllers area/networking labels Dec 4, 2019
@duglin
Copy link
Author

duglin commented Dec 4, 2019

What's also interesting is that if I use curl instead of my client to talk to the service (so everything thinks it's just doing a normal http request, not a streaming one) I get the same error, and still no ack from the service that it got an HTTP request

@duglin
Copy link
Author

duglin commented Dec 4, 2019

More info... removing the h2c name on the port makes it so that I reach the ksvc now - things fail since it's not a streaming connection, but it appears that the presence of the h2c name is causing the 502 and my inability to even reach the user-container pod.

@tanzeeb
Copy link
Contributor

tanzeeb commented Dec 4, 2019

the presence of the h2c name is causing the 502 and my inability to even reach the user-container pod

This is expected behaviour if your app isn't actually http2 (based on the slack conversation, the app was http1?)

@duglin
Copy link
Author

duglin commented Dec 4, 2019

so knative only supports http2 streaming?

@markusthoemmes
Copy link
Contributor

We do support Websockets on HTTP1. Is there any reason why you're manually hijacking the connection? I'm not familiar enough with the Websocket specification to know if there's more to it than just hijacking but if there is, maybe this just stumbles over the entire dataplane not supporting it? That's shooting in the dark though.

Can you just use Websockets?

@tanzeeb
Copy link
Contributor

tanzeeb commented Dec 4, 2019

so knative only supports http2 streaming?

Just referring to the port name, use h2c only if your app actually supports HTTP/2

You can stream with HTTP/1 with websockets (as @markusthoemmes mentioned), but I'm curious about why your ad-hoc streaming isn't working.

You mentioned that it works with vanilla kubernetes, have you confirmed that it works with Istio + vanilla kubernetes?

@vagababov
Copy link
Contributor

vagababov commented Dec 4, 2019 via email

@duglin
Copy link
Author

duglin commented Dec 4, 2019

I didn't test with kube, I tested with just the stand-alone executables to make sure the code (outside of containers) even works at all. And it does. So I was hoping that 1) containerizing didn't break it, and 2) knative didn't break it.

I just tested it with Docker and it works, so containerizing in general isn't the issue.

@tcnghia
Copy link
Contributor

tcnghia commented Dec 4, 2019

I wouldn't be surprised if the Go reverseproxy (used in activator and queue-proxy) doesn't support hijacking in general. Can you try sending a request directly to the revision's Service? That would bypass all proxies except for queue-proxy to see if the problem is there.

@duglin
Copy link
Author

duglin commented Dec 5, 2019

not 100% sure if I did it right, but I edited the Kube Services associated with the Ksvc so they had external IPs and had the client talk directly to those. I still get the same results, the server does get a connection request but then after the hijack I don't see anything else flow from the client to the server.

@duglin
Copy link
Author

duglin commented Dec 5, 2019

We do support Websockets on HTTP1. Is there any reason why you're manually hijacking the connection? I'm not familiar enough with the Websocket specification to know if there's more to it than just hijacking but if there is, maybe this just stumbles over the entire dataplane not supporting it? That's shooting in the dark though.

Can you just use Websockets?

I'll give that a try tomorrow - only reason I picked this current path is because I don't know much about websockets so I picked the easiest path for me to get to the tcp socket layer :-)

@duglin
Copy link
Author

duglin commented Dec 6, 2019

I changed my code to use a 'real' websocket impl and that does seem to work. However, during my migration to ws I noticed some error messages from the queue proxy logs that lead me to believe that while we do support streaming we might only support certain types of streaming. Meaning, the code might be looking for certain http headers (like the ws ones), and something more generic (like my hack to get to the tcp socket) won't work - even though it works for stand-alone exes and container cases like using Docker and vanilla Kube. It appears to only fail for Knative.

Is this a usecase that should work?

@vagababov
Copy link
Contributor

The only headers we look at in QP are revision/probing/tracing related, and since 1.12 golang supports reverse websocket proxying, I think.
But it does not preclude that it might be buggy or we might be doing something wrong :-)

@duglin
Copy link
Author

duglin commented Dec 6, 2019

I just updated my repo so that people can test with 'ws' and 'my hack' easily. Just use '-h' on the client when you want to use the non-ws/hacky path.

But, am I correct that both paths should work in Kn since they both work in Kube?

@duglin
Copy link
Author

duglin commented Dec 31, 2019

@tcnghia should a non-web-socket stream work?

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2020
@duglin
Copy link
Author

duglin commented Mar 31, 2020

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 31, 2020
@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2020
@duglin
Copy link
Author

duglin commented Jun 29, 2020

/remove-lifecycle stale

@tcnghia should a non-web-socket stream work?

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2020
@duglin
Copy link
Author

duglin commented Jul 28, 2020

@tcnghia any comment on this one?

@markusthoemmes
Copy link
Contributor

Does a plain Golang httputil.ReverseProxy support what you're trying to do?

@brianschardt
Copy link

im running into this same exact issue, on google cloud run for Anthos.

@vagababov
Copy link
Contributor

cc @ZhiminXiang

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2020
@duglin
Copy link
Author

duglin commented Dec 16, 2020

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2020
@markusthoemmes
Copy link
Contributor

Does a plain Golang httputil.ReverseProxy support what you're trying to do?

Still wondering if that's the case.

@duglin
Copy link
Author

duglin commented Dec 16, 2020

@markusthoemmes do you mean using that in the Knative code or in the app? All I'm looking for is the ability to use low-level reads and writes on the connection w/o having to use websockets - see: https://github.com/duglin/knstream/blob/master/stream.go#L38-L47

@markusthoemmes
Copy link
Contributor

I mean with plain Golang. I'm trying to figure out if we need to change or if a Golang stdlib change might be necessary.

@duglin
Copy link
Author

duglin commented Dec 16, 2020

I'm pretty sure it's something in Knative since this app works just fine in generic kube.

@markusthoemmes
Copy link
Contributor

Could you please test if it's also fine behind a Golang proxy using httputil.ReverseProxy. Asking specifically, because I do know that the reverse proxy stuff in Golang (which we use in both queue-proxy and activator) is having some level of trouble with some read/write use-cases. See golang/go#40747 for example.

@duglin
Copy link
Author

duglin commented Dec 17, 2020

I added an httputil.ReverseProxy to my server side sample (to intercept calls to my server) and it seems to fail with generic tcp stuff. With websockets it works just fine.

@duglin
Copy link
Author

duglin commented Dec 17, 2020

would it make sense to just create our own proxy until they fix it? I don't think we need anything too complex do we?

@markusthoemmes
Copy link
Contributor

I absolutely wouldn't want to take writing that proxy lightly. It's not the most complex of things, but I don't want to be in the business of writing a performant alternative to it, unless there is a really good reason to.

TBH, this doesn't meet that bar for me, considering this is the first time it seemingly doesn't interop "as expected".

Is what you're doing covered by the HTTP spec? IIRC HTTP/1 doesn't allow reading from the body while the response is already written or something like that. I might be wrong though.

@duglin
Copy link
Author

duglin commented Dec 17, 2020

Unless I missed it, I don't write to the connection before converting it to a tcp connection.

Can't say how important it is, other than I think I've heard of other folks running into this and it seems odd that we don't support anything but websockets - especially when generic kube (and docker) does.

@markusthoemmes
Copy link
Contributor

@duglin tbh it doesn't surprise me at all, given our transports are purely HTTP focused so if you're doing things that are not captured or even disallowed by the HTTP spec, I wouldn't expect stuff to work.

@markusthoemmes
Copy link
Contributor

Seems like the HTTP spec does allow what you're doing (see golang/go#15527). Essentially, I believe you're reading/writing from the request/to the response in parallel, which Go currently doesn't seem to be supporting.

@duglin
Copy link
Author

duglin commented Dec 17, 2020

wow that's an old issue

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2021
@duglin
Copy link
Author

duglin commented Mar 18, 2021

/remove-lifecycle stale

@knative-prow-robot knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2021
@evankanderson
Copy link
Member

Looking at golang/go#15527 and reviewing this thread, it seems like it may be possible to implement your own stream-up-and-down using HTTP.

Looking at the code in https://github.com/duglin/knstream/blob/master/stream.go#L23, however, this does not appear to be a valid HTTP case. In particular, the response does not appear to be an HTTP response, but a simple echoing of the incoming strings. I'd expect a proper HTTP response to begin with a Status-Line which included the HTTP-Version, Status-Code and Reason-Phrase. Without that, I'd expect a well-behaved HTTP reverse proxy to drop the connection when the non-HTTP content was detected.

If this is an important use case, can you update knstream to return a proper HTTP reply and simply stream the body after the Status-Line and any necessary headers? (It looks like you can use "5. By the server closing the connection" to set the length of the reply.)

/triage needs-user-input

@knative-prow-robot knative-prow-robot added the triage/needs-user-input Issues which are waiting on a response from the reporter label Mar 22, 2021
@evankanderson
Copy link
Member

/kind spec
/remove-area API

@knative-prow-robot knative-prow-robot added kind/spec Discussion of how a feature should be exposed to customers. and removed area/API API objects and controllers labels Mar 22, 2021
@evankanderson
Copy link
Member

/close

Another reasonable test here would be to attempt to put the knstream application behind another forward or reverse proxy to see if it works (for example, Squid or nginx). If those other codebases are willing to pass the HTTP stream and Knative is not, then I think it's worth opening a new bug. Two hand-written binaries talking to each other which sometimes use an HTTP library and sometimes use a TCP socket are likely to be broken by most proxies, and Knative does expect to be able to perform some moderately-advanced HTTP proxying.

@knative-prow-robot
Copy link
Contributor

@evankanderson: Closing this issue.

In response to this:

/close

Another reasonable test here would be to attempt to put the knstream application behind another forward or reverse proxy to see if it works (for example, Squid or nginx). If those other codebases are willing to pass the HTTP stream and Knative is not, then I think it's worth opening a new bug. Two hand-written binaries talking to each other which sometimes use an HTTP library and sometimes use a TCP socket are likely to be broken by most proxies, and Knative does expect to be able to perform some moderately-advanced HTTP proxying.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking kind/question Further information is requested kind/spec Discussion of how a feature should be exposed to customers. triage/needs-user-input Issues which are waiting on a response from the reporter
Projects
None yet
Development

No branches or pull requests

9 participants