Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use testcontainer for write tests #51594

Merged

Conversation

jdpgrailsdev
Copy link
Contributor

@jdpgrailsdev jdpgrailsdev commented Jan 16, 2025

What

  • Use MSSQL testcontainer for write integration tests

How

  • Configure write integration tests to use MSSQL testcontainer
  • Ensure that configuration is updated before each integration test
  • Support replacement of values in already deserialized configuration

Review guide

  1. BasicFunctionalityIntegrationTest.kt
  2. MSSQLConfiguration.kt
  3. MSSQLContainerHelper.kt
  4. MSSQLWriterTest.kt

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

@jdpgrailsdev jdpgrailsdev requested a review from a team as a code owner January 16, 2025 20:41
Copy link

vercel bot commented Jan 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 17, 2025 2:50pm

Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused about the parallel tests fighting each other thing - in principle, every test is operating on <random namespace>.<hardcoded stream name(s)>? So they shouldn't ever be touching the same MSSQL table. Kind of worried that randomizing the stream name is just papering over some underlying problem

otherwise lgtm 🚛


@Test
open fun testBasicWrite() {
val streamName = generateStreamName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still necessary?

@jdpgrailsdev
Copy link
Contributor Author

I'm still confused about the parallel tests fighting each other thing - in principle, every test is operating on <random namespace>.<hardcoded stream name(s)>? So they shouldn't ever be touching the same MSSQL table. Kind of worried that randomizing the stream name is just papering over some underlying problem

otherwise lgtm 🚛

@edgao From what I discovered, I think that the issue is that the random schema is per test suite, not per test. Therefore, each test uses the same schema name AND the same stream name before this change. If that is not the intent, we should fix the base integration test classes to use a random schema name per test and not per suite.

@edgao
Copy link
Contributor

edgao commented Jan 16, 2025

ugh. There's probably some magic junit configuration that I forgot about somewhere... I'm pretty sure (85%) that s3/iceberg are creating a new instance of the test class per test method execution :/

@@ -114,9 +135,21 @@ class MSSQLDataCleaner : DestinationCleaner {
}
}

@TestInstance(TestInstance.Lifecycle.PER_CLASS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually derp, I missed this on my readthrough - why do we need this? I'm pretty sure this is the problem behind #51594 (comment)

(i.e. if you remove this, it should work as expected, b/c the default is PER_METHOD)

@jdpgrailsdev
Copy link
Contributor Author

ugh. There's probably some magic junit configuration that I forgot about somewhere... I'm pretty sure (85%) that s3/iceberg are creating a new instance of the test class per test method execution :/

@edgao I can look at the iceberg tests tomorrow to see if I missed something. It may also be that we had to use per class test instances to avoid starting and stopping the test container after each test. If that is what is causing this, I'd rather use the random stream name that incur the cost of spin up/spin down of the container on each test.

@edgao
Copy link
Contributor

edgao commented Jan 16, 2025

the pattern we've used in other tests is to actually share the test container across test classes, via some singleton object. roughly

object SharedTestContainer {
  private val isStarted: AtomicBoolean
  fun start() { ... }
}

class TestClass1 {
  @BeforeAll fun setup() { SharedTestContainer.start() }
}

class TestClass2 {
  @BeforeAll fun setup() { SharedTestContainer.start() }
}

e.g. https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-s3-data-lake/src/test-integration/kotlin/io/airbyte/integrations/destination/s3_data_lake/NessieTestContainers.kt. IIRC the old postgres/mysql destinations also do something along these lines

(we don't need to explicitly call testContainer.shutdown(), because testcontainers spawn a ryuk container, which kills the testcontainer automatically when it detects the JVM process exit)

@jdpgrailsdev
Copy link
Contributor Author

jdpgrailsdev commented Jan 17, 2025

the pattern we've used in other tests is to actually share the test container across test classes, via some singleton object. roughly

object SharedTestContainer {
  private val isStarted: AtomicBoolean
  fun start() { ... }
}

class TestClass1 {
  @BeforeAll fun setup() { SharedTestContainer.start() }
}

class TestClass2 {
  @BeforeAll fun setup() { SharedTestContainer.start() }
}

e.g. https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-s3-data-lake/src/test-integration/kotlin/io/airbyte/integrations/destination/s3_data_lake/NessieTestContainers.kt. IIRC the old postgres/mysql destinations also do something along these lines

(we don't need to explicitly call testContainer.shutdown(), because testcontainers spawn a ryuk container, which kills the testcontainer automatically when it detects the JVM process exit)

@edgao That doesn't work in Kotlin. You cannot use BeforeAll/AfterAll on non-static methods. The only way for that to work is to mark the test instances as per class so that it can use the non-static methods. From looking at the Iceberg tests, it looks like the tests were also marked as JvmStatic to work around this, which is a bit of a hack. I can make that change to this PR and back out the random stream name change to see if that works.

@jdpgrailsdev jdpgrailsdev force-pushed the jonathan/mssql-integration-tests branch from 6e27019 to 46c0e54 Compare January 17, 2025 14:18
@jdpgrailsdev
Copy link
Contributor Author

@edgao I am able to make the tests pass without the random stream name by using the @JvmStatic + companion object trick. That being said, this is definitely a bit of an anti-pattern to put the before all/after all stuff in a companion object just to work around this issue. I don't see why we need to use the same stream name for all tests. Randomly generating one per test would allow us to make this more Kotlin-y and avoid this hack.

@edgao
Copy link
Contributor

edgao commented Jan 17, 2025

medium strength opinion: I don't see this as a hack. JvmStatic is annoying, yes, but that's more a function of us using junit in kotlin than anything? If we were on kotest, it would be a perfectly normal beforeSpec block. (I think? haven't actually used kotest myself)

also, I think our tests are already sufficiently complicated... Randomizing the namespace was annoying, but probably unavoidable - I'd like to avoid adding even more dynamicly-generated stuff when possible

and more broadly:

  • we all agree starting+stopping testcontainers is expensive
  • we all agree starting a testcontainer per method is probably a nonstarter
  • starting a testcontainer once per gradle run instead of once per test class feels like the logical progression?
    • in particular - it removes the incentive to put all our tests in a single class for the sake of reducing infra overhead

@jdpgrailsdev
Copy link
Contributor Author

medium strength opinion: I don't see this as a hack. JvmStatic is annoying, yes, but that's more a function of us using junit in kotlin than anything? If we were on kotest, it would be a perfectly normal beforeSpec block. (I think? haven't actually used kotest myself)

also, I think our tests are already sufficiently complicated... Randomizing the namespace was annoying, but probably unavoidable - I'd like to avoid adding even more dynamicly-generated stuff when possible

and more broadly:

  • we all agree starting+stopping testcontainers is expensive

  • we all agree starting a testcontainer per method is probably a nonstarter

  • starting a testcontainer once per gradle run instead of once per test class feels like the logical progression?

    • in particular - it removes the incentive to put all our tests in a single class for the sake of reducing infra overhead

@edgao I guess I was referring to the use of a companion object to work around this as a hack, which it is -- its to make this work with Java-based JUnit. That being said, I don't feel strongly about it. I'd rather have us agree upon what level should be dynamically generated. It is currently the namespace and it sounds like we don't believe that it needs to be done at the name level of the stream. I'm not sure that I am sold on that yet, but given that it works now that I have introduced the companion object and static designation, we can pause this discussion for now. Thanks for the review/feedback.

@jdpgrailsdev jdpgrailsdev merged commit f5becbb into move/destination-mssql-v2 Jan 17, 2025
24 of 31 checks passed
@jdpgrailsdev jdpgrailsdev deleted the jonathan/mssql-integration-tests branch January 17, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/destination/mssql-v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants