-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http/httputil: ReverseProxy read error during body copy use of closed network connection #40747
Comments
I've got something that I think could be related to this - also on Windows, but I'm getting a panic as well: 2020/11/12 15:22:28 httputil: ReverseProxy read error during body copy: read tcp 127.0.0.1:49351->127.0.0.1:8090: wsarecv: An established connection was aborted by the software in your host machine.
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x667335]
goroutine 147 [running]:
bufio.(*Reader).Read(0xc00018e180, 0xc0002e4000, 0x2000, 0x2000, 0x2000, 0xc0002e4000, 0xc000182f20)
E:/Dev/Go/src/bufio/bufio.go:212 +0x125
io.(*LimitedReader).Read(0xc000182ec0, 0xc0002e4000, 0x2000, 0x2000, 0xc00008e140, 0x10, 0xc000307be8)
E:/Dev/Go/src/io/io.go:451 +0x6a
net/http.(*body).readLocked(0xc0000e0cc0, 0xc0002e4000, 0x2000, 0x2000, 0x2000, 0xc0002e4000, 0xc0001ac980)
E:/Dev/Go/src/net/http/transfer.go:847 +0x66
net/http.bodyLocked.Read(0xc0000e0cc0, 0xc0002e4000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
E:/Dev/Go/src/net/http/transfer.go:1046 +0x85
io.(*LimitedReader).Read(0xc0001ac960, 0xc0002e4000, 0x2000, 0x2000, 0xc000085a30, 0x409d72, 0xc000008000)
E:/Dev/Go/src/io/io.go:451 +0x6a
io/ioutil.devNull.ReadFrom(0x0, 0xae2760, 0xc0001ac960, 0x985880, 0x1, 0x273fbf68)
E:/Dev/Go/src/io/ioutil/ioutil.go:144 +0x99
io.copyBuffer(0xae3400, 0xed2d98, 0xae2760, 0xc0001ac960, 0x0, 0x0, 0x0, 0x5, 0x1, 0xc0001ff740)
E:/Dev/Go/src/io/io.go:391 +0x303
io.Copy(...)
E:/Dev/Go/src/io/io.go:364
io.CopyN(0xae3400, 0xed2d98, 0xae34e0, 0xc0000e0cc0, 0x40000, 0xc0000c8a50, 0x0, 0xc000022ac0)
E:/Dev/Go/src/io/io.go:340 +0xa1
net/http.(*body).Close(0xc0000e0cc0, 0x0, 0x0)
E:/Dev/Go/src/net/http/transfer.go:999 +0x14a
net/http.(*Request).closeBody(...)
E:/Dev/Go/src/net/http/request.go:1391
net/http.(*persistConn).writeLoop(0xc0000647e0)
E:/Dev/Go/src/net/http/transport.go:2296 +0x364
created by net/http.(*Transport).dialConn
E:/Dev/Go/src/net/http/transport.go:1648 +0xc82 This happened after a local application that communicates with this process had been running for a few weeks successfully. |
This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector . |
I'm hitting the very same symptoms on Golang 1.15.5. I also whipped up a quick reproducer that reliably fails on my machine: `go env` output
Reproducer
package main
import (
"bytes"
"fmt"
"io/ioutil"
"log"
"net/http"
"net/http/httptest"
"net/http/httputil"
"net/url"
"sync"
"testing"
)
const (
bodySize = 32 * 1024
parallelism = 32
disableKeepAlives = false
)
func TestProxyEcho(t *testing.T) {
// The server responding with the sent body.
echoServer := httptest.NewServer(http.HandlerFunc(
func(w http.ResponseWriter, req *http.Request) {
body, err := ioutil.ReadAll(req.Body)
if err != nil {
log.Printf("error reading body: %v", err)
http.Error(w, fmt.Sprintf("error reading body: %v", err), http.StatusInternalServerError)
return
}
if _, err := w.Write(body); err != nil {
log.Printf("error writing body: %v", err)
}
},
))
defer echoServer.Close()
// The server proxying requests to the echo server.
echoURL, err := url.Parse(echoServer.URL)
if err != nil {
t.Fatalf("Failed to parse echo URL: %v", err)
}
proxy := httputil.NewSingleHostReverseProxy(echoURL)
proxyServer := httptest.NewServer(proxy)
defer proxyServer.Close()
transport := http.DefaultTransport.(*http.Transport).Clone()
transport.DisableKeepAlives = disableKeepAlives
c := &http.Client{
Transport: transport,
}
body := make([]byte, bodySize)
for i := 0; i < cap(body); i++ {
body[i] = 42
}
var wg sync.WaitGroup
wg.Add(parallelism)
for i := 0; i < parallelism; i++ {
go func(i int) {
defer wg.Done()
for i := 0; i < 1000; i++ {
if err := send(c, proxyServer.URL, body); err != nil {
t.Errorf("error during request: %v", err)
}
}
}(i)
}
wg.Wait()
}
func send(client *http.Client, url string, body []byte) error {
r := bytes.NewBuffer(body)
req, err := http.NewRequest("POST", url, r)
if err != nil {
return fmt.Errorf("failed to create request: %w", err)
}
resp, err := client.Do(req)
if err != nil {
return fmt.Errorf("failed to execute request: %w", err)
}
defer resp.Body.Close()
rec, err := ioutil.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read body: %w", err)
}
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
if len(rec) != len(body) {
return fmt.Errorf("unexpected body length: %d", len(rec))
}
return nil
} |
cc/ @FiloSottile this looks like #34902, which you worked on. WDYT? This also looks a lot like #15527 but I didn't find the race at a first glance. Running the code suggested by @markusthoemmes a few times reliably repros the issue, but even if ran with The offending call seems to be go/src/net/http/httputil/reverseproxy.go Line 328 in 75e16f5
which calls go/src/net/http/httputil/reverseproxy.go Line 439 in 75e16f5
which encounters this error here: go/src/net/http/httputil/reverseproxy.go Line 451 in 75e16f5
If we look at the first snippet, data is copied from My best guess here is that while we are reading from This could be caused by a persistent connection doing some work in the background, but it would require some more in-depth analysis to validate. |
Not 100% sure it's related but it does appear the reverseproxy has issues with raw tcp stuff - see this: knative/serving#6146 (comment) |
FWIW, my reproducer above sadly still fails on Golang 1.16.3, in case anybody was wondering. |
I did some digging into this and confirmed #40747 (comment) (cc @empijei), but I'm not quite sure what the persistent conn is doing in the background. In the test case that @markusthoemmes submitted, the server we're reverse proxying to clearly reads the entire request and then responds, so the request should always be fully written (at least in this example) before the proxy can even start reading the response. That would mean that in I have a local patch which fixes this issue by ensuring we never return from I'm happy to continue looking into this if someone could continue pointing me in a better direction. |
I chased the reproducer here for a while and, at least as I was able to exercise that code, found it to be related to the HTTP Client (outbound from the proxy) competing with the HTTP Server that provided the request body, via the reverse proxy, when they both attempt to drain and close the same request body. The server code doesn't handle the case where the client has read to EOF but hasn't closed the body yet, and the client code doesn't try to take the locks used by the server between its own drain and the close operations. This image attempts to walk through it in a sequential sort of way: This possibility leads me to this possible patch:
Since that's within the lock, it allows the similar "auto-close" of the request body in such a way that the outbound/http.Client side can't compete with it, and then once the outbound/http.Client reaches that competing read, the reader has already been safely closed. (It gives us a method to also safely run the close when we already hold the lock.) I can think of a couple other ways to potentially fix it too. It might be possible to enhance That patch passes the tests in net/http, but I'll admit to feeling like the above folks, that there might be broader implications warranting a closer look from someone more familiar with this code and the http specs. |
Hi any news on this one (any plans for a fix)? /cc @dmitshur @stephens2424. |
Hi, any news on this issue ? it's blocking me a lot |
cc @neild who is the new owner |
Gentle ping @neild ? |
We've started seeing this error more frequently in Cloud Foundry as well. I've attempted to look into it, but it is far out of my depth. Any guidance, suggestions, timeline, or acknowledgment from the owners would be much appreciated. |
I received this error today: |
Gentle ping @neild. Any updates on this? Reproducers are still failing with latest |
Sorry for the lengthy time in getting to this. Thanks to @markusthoemmes for the reproduction case and @stephens2424 for the analysis, both were very helpful in understanding what's going on. The basic problem here is that the HTTP/1 server implementation does not support reading from the inbound request body after beginning to write the outbound response. On writing the response, any unread portion of the inbound body is consumed (up to a limit), to avoid deadlocking clients that expect to write the complete request before reading any of the response. #15527 is a long-standing feature request to disable this behavior, allowing concurrent reads from the request and writes to the response. #15527 (comment) has some context on why the server implementation behaves the way it does. The issue here is that In the reproduction case above, this manifests as a very racy failure because the server behind the proxy consumes the full request before it starts responding. I believe that what's happening is Lines 370 to 375 in 16cec4e
If the upstream server starts responding before it fully consumes the proxied request, the failure manifests consistently when the request is sufficiently large. A 1MiB request proxied to a server that does There are several ways we can address this. We can change In the longer run, however, there has been a long-standing feature request (#15527) to permit HTTP/1 server handlers to concurrently read a request and write a response. If we supported this, then |
@neild gentle ping should we use that method to fix this one? |
I tried that method in the reverse proxy handler in the reproducer above and it fixes the issue. |
Does anyone have an example of a client that breaks when full duplex is enabled? I'm wondering if certain proxies (eg. envoy) would work with older clients and go servers with full duplex enabled - and I can do some testing. |
@neild hi, any ideas about #40747 (comment)? |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
What did you expect to see?
post request sucessed and there is only one persistent connection
What did you see instead?
the following log appears in proxy .
many TIME_WAIT conntecions appears .
Everything was ok in any of the the following scenarios:
PROXY
OutputSERVER
OutputThe text was updated successfully, but these errors were encountered: