Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

Closed
sixcolors opened this issue Feb 21, 2024 · 14 comments
Closed

🐛 [Bug]: Isolation Issue with Parallel Subtests #2871

sixcolors opened this issue Feb 21, 2024 · 14 comments

Comments

@sixcolors
Copy link
Member

sixcolors commented Feb 21, 2024

An issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can show problems when the tests that are not isolated. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions or other unexpected behavior leading to unpredictable results.

To fix this issue [with false results in tests], we could refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example I ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent original:

func Test_Client_UserAgent(t *testing.T) {
	t.Parallel()

	ln := fasthttputil.NewInmemoryListener()

	app := New()

	app.Get("/", func(c Ctx) error {
		return c.Send(c.Request().Header.UserAgent())
	})

	go func() {
		require.NoError(t, app.Listener(ln, ListenConfig{
			DisableStartupMessage: true,
		}))
	}()

	t.Run("default", func(t *testing.T) {
		t.Parallel()
		for i := 0; i < 5; i++ {
			a := Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, defaultUserAgent, body)
			require.Empty(t, errs)
		}
	})

	t.Run("custom", func(t *testing.T) {
		t.Parallel()
		for i := 0; i < 5; i++ {
			c := AcquireClient()
			c.UserAgent = "ua"

			a := c.Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, "ua", body)
			require.Empty(t, errs)
			ReleaseClient(c)
		}
	})
}

Originally posted by @sixcolors in #2864 (comment)

The issues found in three test files, namely app_test.go, client_test.go, and ctx_test.go.

@efectn
Copy link
Member

efectn commented Feb 21, 2024

I've noticed a common issue that appears throughout the tests. Running tests in parallel using the t.Parallel() function can cause problems if the tests are not isolated correctly. When tests share a state, such as a common instance of app, running them in parallel can lead to race conditions. This means that one test could modify the state while another is reading it, leading to unpredictable results.

To fix this issue, we should refactor the tests and make sure that each test has its isolated state.

If we want to eliminate repetitive code for setup and teardown for each parallel test, we could consider using the testify/suite package.

For example, in the Test_Ctx_Parsers noted above, I also ran into issues with the Test_Client_UserAgent.

Example, Test_Client_UserAgent with a fix:

func Test_Client_UserAgent(t *testing.T) {
	t.Parallel()

	setupApp := func(t *testing.T) *fasthttputil.InmemoryListener {
		t.Helper()
		ln := fasthttputil.NewInmemoryListener()
		app := New()
		app.Get("/", func(c Ctx) error {
			return c.Send(c.Request().Header.UserAgent())
		})
		go func() {
			require.NoError(t, app.Listener(ln, ListenConfig{
				DisableStartupMessage: true,
			}))
		}()
		time.Sleep(100 * time.Millisecond) // wait for server to start
		return ln
	}

	t.Run("default", func(t *testing.T) {
		t.Parallel()
		ln := setupApp(t)
		for i := 0; i < 5; i++ {
			a := Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, defaultUserAgent, body)
			require.Empty(t, errs)
		}
	})

	t.Run("custom", func(t *testing.T) {
		t.Parallel()
		ln := setupApp(t)
		for i := 0; i < 5; i++ {
			c := AcquireClient()
			c.UserAgent = "ua"

			a := c.Get("http://example.com")

			a.HostClient.Dial = func(_ string) (net.Conn, error) { return ln.Dial() }

			code, body, errs := a.String()

			require.Equal(t, StatusOK, code)
			require.Equal(t, "ua", body)
			require.Empty(t, errs)
			ReleaseClient(c)
		}
	})
}

Please let me know if you have any questions or concerns.

Originally posted by @sixcolors in #2864 (comment)

There are some issues with a particular pattern found in three test files, namely app_test.go, client_test.go, and ctx_test.go. However, it seems that earlydata_test.go is doing it correctly by using t.Helper(). Additionally, I noticed that the subtests in etag_test.go, session_test.go, and store_test.go are also using the subtest pattern, but they are not using shared instances as far as I can tell from a quick scan.

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

Btw i agree about setupApp. Maybe we can create something like helper to reduce code duolication on tests/benchmarks

@sixcolors
Copy link
Member Author

sixcolors commented Feb 21, 2024

A similar problem was just noticed in internal/storage/memory/memory_test.go where var testStore = new is set globally in the package, and I got memory corruption issues, that randomly occurred. Since this pkg is using sync and a mutex, we should probably investigate the cause.

@ReneWerner87
Copy link
Member

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup
if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

@ReneWerner87
Copy link
Member

https://github.com/gofiber/fiber/blob/v2/ctx_test.go#L1376-L1467
there must be a bug somewhere in the merge branch or v3, the same test works perfectly in v2 and also in parallel

@ReneWerner87
Copy link
Member

we found the problem,
The fasthttp ctx is filled when the struct is created for the pool and is set to nil when it is packed back into the pool

If you take it out of the pool again it's bad
because it was only filled when the pool item was created and not, as with v2, when the item was configured from the pool

Will fix it tomorrow or the day after, so not isolating it has shown us a real world bug which also occurs in a real application when ctx is not completely recreated, but is reused from the pool

@ReneWerner87
Copy link
Member

v2

fiber/ctx.go

Line 181 in ddc6b23

c.fasthttp = fctx

fiber/ctx.go

Line 193 in ddc6b23

c.fasthttp = nil

v3

fiber/ctx_interface.go

Lines 446 to 452 in 26346d6

func (app *App) AcquireCtx() Ctx {
ctx, ok := app.pool.Get().(Ctx)
if !ok {
panic(errors.New("failed to type-assert to Ctx"))
}
return ctx
}

fiber/ctx_interface.go

Lines 455 to 458 in 26346d6

func (app *App) ReleaseCtx(c Ctx) {
c.release()
app.pool.Put(c)
}

c.fasthttp = nil

The problem here is that the fasthttp context is only set when new items are created for the pool, but not when items from the pool are reused, so we would have to move this to AcquireCtx

fiber/app.go

Lines 497 to 501 in 26346d6

app.pool = sync.Pool{
New: func() any {
return app.NewCtx(&fasthttp.RequestCtx{})
},
}

We should also examine this process again

fiber/ctx_interface.go

Lines 461 to 484 in 26346d6

func (c *DefaultCtx) Reset(fctx *fasthttp.RequestCtx) {
// Reset route and handler index
c.indexRoute = -1
c.indexHandler = 0
// Reset matched flag
c.matched = false
// Set paths
c.pathOriginal = c.app.getString(fctx.URI().PathOriginal())
// Attach *fasthttp.RequestCtx to ctx
c.fasthttp = fctx
// reset base uri
c.baseURI = ""
// Set method
c.method = c.app.getString(fctx.Request.Header.Method())
c.methodINT = c.app.methodInt(c.method)
// Prettify path
c.configDependentPaths()
}

@ReneWerner87
Copy link
Member

We also have to use the code of the Reset method in the AcquireCtx

@sixcolors
Copy link
Member Author

sixcolors commented Feb 22, 2024

@efectn

We should also detect the shared stuff between tests. Something may be wrong or just false positive.

@ReneWerner87

we should investigate these cases and, in my opinion, not simply fix them through isolation

because normally it is a valid setup
if a server is up and running and processing multiple simultaneous requests, if something is shared there, then we should fix the problem rather than the test

I appreciate the thoughtful discussion. My intention in raising the issue was to address what appeared to be false positives [failures] in the unit tests. I understand the importance of considering the validity of shared resources and agree that fixing the underlying problem was crucial. I didn't mean to suggest avoiding further investigation into the root cause.

What occurred to me is that the failures might be signaling a need for more granular unit testing with greater isolation, but I acknowledge that this approach could be impractical. Achieving effective coverage using more isolated unit tests could lead to requiring an extensive set of additional tests, covering such interaction scenarios with concurrency and other complexities. The current tests, though indirectly, did help identify an issue.

Thanks for your understanding. Open to further insights or discussion.

@nickajacks1
Copy link
Member

I think there are two important points to take home.

  1. Testing concurrency is important. For example, recent changes by @gaby to testing & benchmarking revealed several concurrency issues in the templates project.
  2. Having independent tests is important to prevent both false positives and false negatives. Here is a false positive example, 🚨 Test: fix race condition in parallel tests #2734 . A hypothetical false negative might be that you test that foo() creates a file called bar.txt, but that file was actually created in a separate test.

It seems to me that it might be good to at the very least comb over the existing tests to make sure that we are addressing both points. Given how many tests there are, that would probably be a lot of work.

@gaby
Copy link
Member

gaby commented Feb 23, 2024

@nickajacks1 I'm planning on starting to take a look at the benchmarks for the core and the middlewares.

Several middlewares are missing Benchmarks. We also need to add RunParallel() benchmarks. None of the benchmarks are asserting inside the loop which is what exposed the issue with the templates.

Will start tomorrow with 1-2 of them to see if anything raises, we can go from there.

@leonklingele
Copy link
Member

I have made almost all tests work with t.Parallel() in #2469 which got closed unfortunately. Maybe some new work can partly be based upon that PR.

@ReneWerner87
Copy link
Member

bug fixed with 5a0167a

@gaby
Copy link
Member

gaby commented Mar 17, 2024

This is mostly fix by #2892 I'm going to double check today how many tests are not parallel.

@ReneWerner87
Copy link
Member

ok then no need for keeping this open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants