Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http/httputil: ReverseProxy read error during body copy use of closed network connection #40747

Closed
ccdeville opened this issue Aug 13, 2020 · 23 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ccdeville
Copy link

ccdeville commented Aug 13, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\Administrator\AppData\Local\go-build
set GOENV=C:\Users\Administrator\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=D:\wxgo
set GOPRIVATE=
set GOPROXY=https://goproxy.io,direct
set GOROOT=D:\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=D:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessag
e-length=0 -fdebug-prefix-map=C:\Users\ADMINI~1\AppData\Local\Temp\go-build74128
8550=/tmp/go-build -gno-record-gcc-switches

What did you do?

  • I created a proxy and a restfulapi server
  • I used postman made many post requests with body request to server through the proxy(proxy and server run in the same vm,server runed used docker,They use docker0 to connect),and server response body is larger than 4K
  • all requests use http/1.1
  • OS:Centos7.4

What did you expect to see?

post request sucessed and there is only one persistent connection

What did you see instead?

the following log appears in proxy .

httputil: ReverseProxy read error during body copy: read tcp 172.17.0.1:39810->172.17.0.2:8082: use of closed network connection

many TIME_WAIT conntecions appears .
Everything was ok in any of the the following scenarios:

  • Post request without body
  • GET request with same response
  • Response body less than 4K
  • Do not use Keep-Alive
  • Server response not write "Content-Length" in header
  • Use NGINX instead of my proxy
  • Server runed in a vm
PROXY Output
package main

import (
"flag"
"net/http"
"net/http/httputil"
"net/url"
"time"
)

var burl = flag.String("u", "", "backend address")

func buildProxy(target *url.URL) *httputil.ReverseProxy {
flushInterval := time.Duration(100 * time.Millisecond)
passHost := true
passHostHeader := &passHost
targetQuery := target.RawQuery
proxy := &httputil.ReverseProxy{
Director: func(req *http.Request) {
req.URL.Path = target.Path
req.URL.Host = target.Host
req.URL.Scheme = target.Scheme

		req.URL.RawPath = target.RawPath
		if targetQuery == "" || req.URL.RawQuery == "" {
			req.URL.RawQuery = targetQuery + req.URL.RawQuery
		} else {
			req.URL.RawQuery = targetQuery + "&" + req.URL.RawQuery
		}
		req.RequestURI = ""
		req.Proto = "HTTP/1.1"
		req.ProtoMajor = 1
		req.ProtoMinor = 1
		if passHostHeader != nil && !*passHostHeader {
			req.Host = req.URL.Host
		}
	},
	Transport:     http.DefaultTransport,
	FlushInterval: flushInterval,
}
return proxy

}

func handler(p *httputil.ReverseProxy) func(http.ResponseWriter, *http.Request) {
return func(w http.ResponseWriter, r *http.Request) {

	p.ServeHTTP(w, r)
}

}
func main() {
flag.Parse()
remote, err := url.ParseRequestURI("http://" + *burl)
if err != nil {
panic(err)
}

proxy := buildProxy(remote)
http.HandleFunc("/", handler(proxy))
err = http.ListenAndServe(":8081", nil)
if err != nil {
	panic(err)
}

}

SERVER Output
package main

import (
"fmt"
"strconv"
"github.com/gin-gonic/gin"
)

type myJson struct {
Name string json:"name"
}

func createBigString(name string) string {

var bs string
for i := 1; i <= 500; i++ {
	bs += name + "abcdefghij"

}

return bs

}
func main() {
gin.SetMode(gin.ReleaseMode)
router := gin.Default()

router.POST("/test/post", func(c *gin.Context) {
	var json myJson
	c.ShouldBindJSON(&json)
	myName := createBigString(json.Name)
	lenBody := len(myName) + 11
	fmt.Println(lenBody)
	c.Header("Content-Length", strconv.Itoa(lenBody))
	c.JSON(200, gin.H{"name": myName})
})
router.Run(":8082")

}

@ccdeville ccdeville changed the title ReverseProxy read error during body copy ReverseProxy read error during body copy use of closed network connection Aug 13, 2020
@ccdeville ccdeville changed the title ReverseProxy read error during body copy use of closed network connection net/http/httputil:ReverseProxy read error during body copy use of closed network connection Aug 13, 2020
@andybons andybons changed the title net/http/httputil:ReverseProxy read error during body copy use of closed network connection net/http/httputil: ReverseProxy read error during body copy use of closed network connection Aug 13, 2020
@dmitshur
Copy link
Contributor

/cc @bradfitz per owners.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 17, 2020
@dmitshur dmitshur added this to the Backlog milestone Aug 17, 2020
@coolaj86
Copy link

coolaj86 commented Nov 13, 2020

I've got something that I think could be related to this - also on Windows, but I'm getting a panic as well:

2020/11/12 15:22:28 httputil: ReverseProxy read error during body copy: read tcp 127.0.0.1:49351->127.0.0.1:8090: wsarecv: An established connection was aborted by the software in your host machine.
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x667335]
goroutine 147 [running]:
bufio.(*Reader).Read(0xc00018e180, 0xc0002e4000, 0x2000, 0x2000, 0x2000, 0xc0002e4000, 0xc000182f20)
    E:/Dev/Go/src/bufio/bufio.go:212 +0x125
io.(*LimitedReader).Read(0xc000182ec0, 0xc0002e4000, 0x2000, 0x2000, 0xc00008e140, 0x10, 0xc000307be8)
    E:/Dev/Go/src/io/io.go:451 +0x6a
net/http.(*body).readLocked(0xc0000e0cc0, 0xc0002e4000, 0x2000, 0x2000, 0x2000, 0xc0002e4000, 0xc0001ac980)
    E:/Dev/Go/src/net/http/transfer.go:847 +0x66
net/http.bodyLocked.Read(0xc0000e0cc0, 0xc0002e4000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    E:/Dev/Go/src/net/http/transfer.go:1046 +0x85
io.(*LimitedReader).Read(0xc0001ac960, 0xc0002e4000, 0x2000, 0x2000, 0xc000085a30, 0x409d72, 0xc000008000)
    E:/Dev/Go/src/io/io.go:451 +0x6a
io/ioutil.devNull.ReadFrom(0x0, 0xae2760, 0xc0001ac960, 0x985880, 0x1, 0x273fbf68)
    E:/Dev/Go/src/io/ioutil/ioutil.go:144 +0x99
io.copyBuffer(0xae3400, 0xed2d98, 0xae2760, 0xc0001ac960, 0x0, 0x0, 0x0, 0x5, 0x1, 0xc0001ff740)
    E:/Dev/Go/src/io/io.go:391 +0x303
io.Copy(...)
    E:/Dev/Go/src/io/io.go:364
io.CopyN(0xae3400, 0xed2d98, 0xae34e0, 0xc0000e0cc0, 0x40000, 0xc0000c8a50, 0x0, 0xc000022ac0)
    E:/Dev/Go/src/io/io.go:340 +0xa1
net/http.(*body).Close(0xc0000e0cc0, 0x0, 0x0)
    E:/Dev/Go/src/net/http/transfer.go:999 +0x14a
net/http.(*Request).closeBody(...)
    E:/Dev/Go/src/net/http/request.go:1391
net/http.(*persistConn).writeLoop(0xc0000647e0)
    E:/Dev/Go/src/net/http/transport.go:2296 +0x364
created by net/http.(*Transport).dialConn
    E:/Dev/Go/src/net/http/transport.go:1648 +0xc82

This happened after a local application that communicates with this process had been running for a few weeks successfully.

@davecheney
Copy link
Contributor

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

@markusthoemmes
Copy link

I'm hitting the very same symptoms on Golang 1.15.5.

I also whipped up a quick reproducer that reliably fails on my machine:

`go env` output

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mthoemme/.cache/go-build"
GOENV="/home/mthoemme/.config/go/env"
GOEXE=""
GOFLAGS="-mod=vendor"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/mthoemme/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/mthoemme/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/lib/golang"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/mthoemme/dev/debugging/serving-EOF/queue-proxy-use-of-closed-network-connection-error/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build009849173=/tmp/go-build -gno-record-gcc-switches"

Reproducer

package main

import (
	"bytes"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"net/url"
	"sync"
	"testing"
)

const (
	bodySize          = 32 * 1024
	parallelism       = 32
	disableKeepAlives = false
)

func TestProxyEcho(t *testing.T) {
	// The server responding with the sent body.
	echoServer := httptest.NewServer(http.HandlerFunc(
		func(w http.ResponseWriter, req *http.Request) {
			body, err := ioutil.ReadAll(req.Body)
			if err != nil {
				log.Printf("error reading body: %v", err)
				http.Error(w, fmt.Sprintf("error reading body: %v", err), http.StatusInternalServerError)
				return
			}

			if _, err := w.Write(body); err != nil {
				log.Printf("error writing body: %v", err)
			}
		},
	))
	defer echoServer.Close()

	// The server proxying requests to the echo server.
	echoURL, err := url.Parse(echoServer.URL)
	if err != nil {
		t.Fatalf("Failed to parse echo URL: %v", err)
	}
	proxy := httputil.NewSingleHostReverseProxy(echoURL)
	proxyServer := httptest.NewServer(proxy)
	defer proxyServer.Close()

	transport := http.DefaultTransport.(*http.Transport).Clone()
	transport.DisableKeepAlives = disableKeepAlives
	c := &http.Client{
		Transport: transport,
	}

	body := make([]byte, bodySize)
	for i := 0; i < cap(body); i++ {
		body[i] = 42
	}

	var wg sync.WaitGroup
	wg.Add(parallelism)
	for i := 0; i < parallelism; i++ {
		go func(i int) {
			defer wg.Done()

			for i := 0; i < 1000; i++ {
				if err := send(c, proxyServer.URL, body); err != nil {
					t.Errorf("error during request: %v", err)
				}
			}
		}(i)
	}

	wg.Wait()
}

func send(client *http.Client, url string, body []byte) error {
	r := bytes.NewBuffer(body)
	req, err := http.NewRequest("POST", url, r)
	if err != nil {
		return fmt.Errorf("failed to create request: %w", err)
	}

	resp, err := client.Do(req)
	if err != nil {
		return fmt.Errorf("failed to execute request: %w", err)
	}
	defer resp.Body.Close()

	rec, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return fmt.Errorf("failed to read body: %w", err)
	}

	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("unexpected status code: %d", resp.StatusCode)
	}

	if len(rec) != len(body) {
		return fmt.Errorf("unexpected body length: %d", len(rec))
	}

	return nil
}

@markusthoemmes
Copy link

Any chance we can get some guidance/help investigating this? @bradfitz @rsc @empijei (per OWNERS)

@empijei
Copy link
Contributor

empijei commented Dec 16, 2020

cc/ @FiloSottile this looks like #34902, which you worked on. WDYT?

This also looks a lot like #15527 but I didn't find the race at a first glance. Running the code suggested by @markusthoemmes a few times reliably repros the issue, but even if ran with -race the detector does not pick up the race so it's not a trivial one.

The offending call seems to be

err = p.copyResponse(rw, res.Body, p.flushInterval(res))

which calls

_, err := p.copyBuffer(dst, src, buf)

which encounters this error here:

nr, rerr := src.Read(buf)

If we look at the first snippet, data is copied from res.Body to rw.

My best guess here is that while we are reading from res.Body this transport.RoundTrip call might still be writing to the outgoing request.

This could be caused by a persistent connection doing some work in the background, but it would require some more in-depth analysis to validate.

@duglin
Copy link

duglin commented Dec 17, 2020

Not 100% sure it's related but it does appear the reverseproxy has issues with raw tcp stuff - see this: knative/serving#6146 (comment)

@markusthoemmes
Copy link

FWIW, my reproducer above sadly still fails on Golang 1.16.3, in case anybody was wondering.

@derekparker
Copy link
Contributor

I did some digging into this and confirmed #40747 (comment) (cc @empijei), but I'm not quite sure what the persistent conn is doing in the background. In the test case that @markusthoemmes submitted, the server we're reverse proxying to clearly reads the entire request and then responds, so the request should always be fully written (at least in this example) before the proxy can even start reading the response. That would mean that in (*persistConn).roundTrip we've gotten a response in <-writeErrCh before <-resc.

I have a local patch which fixes this issue by ensuring we never return from (*persistConn).roundTrip until we hear back from <-writeErrCh to confirm our request is complete before returning a response. This however breaks other tests and I don't know the relevant Specs/RFCs well enough to even know if this is an "ok" thing to do. Alternatively setting a time.Sleep of about 5-10 milliseconds right before p.copyResponse in the reverse proxy also fixes this issue, but it's obviously less than ideal.

I'm happy to continue looking into this if someone could continue pointing me in a better direction.

@stephens2424
Copy link

I chased the reproducer here for a while and, at least as I was able to exercise that code, found it to be related to the HTTP Client (outbound from the proxy) competing with the HTTP Server that provided the request body, via the reverse proxy, when they both attempt to drain and close the same request body. The server code doesn't handle the case where the client has read to EOF but hasn't closed the body yet, and the client code doesn't try to take the locks used by the server between its own drain and the close operations.

This image attempts to walk through it in a sequential sort of way:

Screen Shot 2021-09-23 at 4 35 56 PM

This possibility leads me to this possible patch:

diff --git a/src/net/http/server.go b/src/net/http/server.go
index 4d0ce56..4077654 100644
--- a/src/net/http/server.go
+++ b/src/net/http/server.go
@@ -1336,6 +1336,11 @@ func (cw *chunkWriter) writeHeader(p []byte) {
 				}
 			case bdy.unreadDataSizeLocked() >= maxPostHandlerReadBytes:
 				tooBig = true
+			case bdy.sawEOF:
+				err := bdy.closeLocked()
+				if err != nil {
+					w.closeAfterReply = true
+				}
 			default:
 				discard = true
 			}
diff --git a/src/net/http/transfer.go b/src/net/http/transfer.go
index 5ff89cc..3f9f0f3 100644
--- a/src/net/http/transfer.go
+++ b/src/net/http/transfer.go
@@ -972,6 +972,10 @@ func (b *body) unreadDataSizeLocked() int64 {
 func (b *body) Close() error {
 	b.mu.Lock()
 	defer b.mu.Unlock()
+	return b.closeLocked()
+}
+
+func (b *body) closeLocked() error {
 	if b.closed {
 		return nil
 	}

Since that's within the lock, it allows the similar "auto-close" of the request body in such a way that the outbound/http.Client side can't compete with it, and then once the outbound/http.Client reaches that competing read, the reader has already been safely closed. (It gives us a method to also safely run the close when we already hold the lock.)

I can think of a couple other ways to potentially fix it too. It might be possible to enhance (*body).Read to more safely return 0, nil when we know EOF has already been reached but we haven't been closed.

That patch passes the tests in net/http, but I'll admit to feeling like the above folks, that there might be broader implications warranting a closer look from someone more familiar with this code and the http specs.

@skonto
Copy link

skonto commented Jan 12, 2022

Hi any news on this one (any plans for a fix)? /cc @dmitshur @stephens2424.

@nicolasmingo
Copy link

Hi, any news on this issue ? it's blocking me a lot

@dprotaso
Copy link

cc @neild who is the new owner

@skonto
Copy link

skonto commented Jul 5, 2022

Gentle ping @neild ?

@ameowlia
Copy link
Contributor

We've started seeing this error more frequently in Cloud Foundry as well. I've attempted to look into it, but it is far out of my depth. Any guidance, suggestions, timeline, or acknowledgment from the owners would be much appreciated.

@juburr
Copy link

juburr commented Dec 6, 2022

I received this error today: ReverseProxy read error during body copy.
For me, the client (another service) was resetting the connection because it was trying to connect with HTTP, whereas the server-side of the connection was using ListenAndServeTLS.
This was a very simple fix in my case, but you may be seeing something completely different. I just thought I'd share in case it helps someone who is searching for this error message. 😄

@ReToCode
Copy link

Gentle ping @neild. Any updates on this? Reproducers are still failing with latest golang version.

@neild
Copy link
Contributor

neild commented Jan 13, 2023

Sorry for the lengthy time in getting to this.

Thanks to @markusthoemmes for the reproduction case and @stephens2424 for the analysis, both were very helpful in understanding what's going on.

The basic problem here is that the HTTP/1 server implementation does not support reading from the inbound request body after beginning to write the outbound response. On writing the response, any unread portion of the inbound body is consumed (up to a limit), to avoid deadlocking clients that expect to write the complete request before reading any of the response.

#15527 is a long-standing feature request to disable this behavior, allowing concurrent reads from the request and writes to the response.

#15527 (comment) has some context on why the server implementation behaves the way it does.

The issue here is that ReverseProxy starts writing the proxied response before it finishes reading from the proxied request.

In the reproduction case above, this manifests as a very racy failure because the server behind the proxy consumes the full request before it starts responding. I believe that what's happening is ReverseProxy's server is closing the request body (which is both the inbound and outbound body) in between the transport's two reads here (the first reading the full body, and the second expecting to just read an io.EOF):

go/src/net/http/transfer.go

Lines 370 to 375 in 16cec4e

ncopy, err = t.doBodyCopy(w, io.LimitReader(body, t.ContentLength))
if err != nil {
return err
}
var nextra int64
nextra, err = t.doBodyCopy(io.Discard, body)

If the upstream server starts responding before it fully consumes the proxied request, the failure manifests consistently when the request is sufficiently large. A 1MiB request proxied to a server that does io.Copy(w, r.Body) is a good reproduction case. (Note that in this case the proposed fix in #40747 (comment) does not work, because the body has not been read to EOF when the response headers are written.)

There are several ways we can address this.

We can change ReverseProxy to wait for the transport to finish consuming an HTTP/1 request body before it begins proxying the response. This will make ReverseProxy adhere to the requirements spelled out in the http.ResponseWriter documentation. I think this should be a safe change to make, since any cases where a user might want to concurrently send the proxied request body while receiving the response are already broken. (This does work for HTTP/2, so we'd want to take care not to break that case.) I think this is a good first step.

In the longer run, however, there has been a long-standing feature request (#15527) to permit HTTP/1 server handlers to concurrently read a request and write a response. If we supported this, then ReverseProxy could just behave as it does now. There are good reasons why this probably can't be the default server behavior, but we can support it on an opt-in basis. I've filed #57786 with a proposed API for doing so.

@geofffranks
Copy link

I saw a note that #15527 has been resolved for golang 1.21. It also looks like #57786 has been implemented to golang 1.21.

Sounds like this is likely also able to be closed @neild? Implementers would want to look at the new http.ResponseController.EnableFullDuplex() method, correct?

@skonto
Copy link

skonto commented Aug 29, 2023

@neild gentle ping should we use that method to fix this one?

@skonto
Copy link

skonto commented Aug 30, 2023

I tried that method in the reverse proxy handler in the reproducer above and it fixes the issue.
However, I think using that method is not that flexible as clients need to know about the use of it and we may have a mixed type of clients.

@dprotaso
Copy link

dprotaso commented Aug 30, 2023

Does anyone have an example of a client that breaks when full duplex is enabled? I'm wondering if certain proxies (eg. envoy) would work with older clients and go servers with full duplex enabled - and I can do some testing.

@skonto
Copy link

skonto commented Jan 8, 2024

@neild hi, any ideas about #40747 (comment)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests