Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guard against invalid end timestamps #3363

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

JonasKunz
Copy link
Contributor

@JonasKunz JonasKunz commented Oct 17, 2023

What does this PR do?

Recently our windows tests have been failing:

2023-10-17T06:17:03.0410452Z 2023-10-17 06:16:57,817 [main] ERROR co.elastic.apm.agent.bci.IndyBootstrap - Advice threw an exception, this should never happen!
2023-10-17T06:17:03.0410818Z java.lang.AssertionError: [$.duration: must have a minimum value of 0]
2023-10-17T06:17:03.0411159Z JSON schema path = /apm-server-schema/current/transaction.json
2023-10-17T06:17:03.0411173Z 
2023-10-17T06:17:03.0411256Z {
2023-10-17T06:17:03.0411382Z   "timestamp" : 1697523417814567,
2023-10-17T06:17:03.0411487Z   "name" : "unnamed",
2023-10-17T06:17:03.0411602Z   "id" : "ddba5acdad68e7ab",
2023-10-17T06:17:03.0411791Z   "trace_id" : "054914ce0a91fb98a7e41a5ee516a021",
2023-10-17T06:17:03.0411943Z   "type" : "custom",
2023-10-17T06:17:03.0412102Z   "duration" : -5.49,
2023-10-17T06:17:03.0412207Z   "outcome" : "success",
2023-10-17T06:17:03.0412302Z   "context" : {
2023-10-17T06:17:03.0412465Z     "service" : {
2023-10-17T06:17:03.0412568Z       "framework" : {
2023-10-17T06:17:03.0412675Z         "name" : "API"
2023-10-17T06:17:03.0412761Z       },
2023-10-17T06:17:03.0412867Z       "version" : null
2023-10-17T06:17:03.0412952Z     },
2023-10-17T06:17:03.0413046Z     "tags" : { }
2023-10-17T06:17:03.0413134Z   },
2023-10-17T06:17:03.0413231Z   "span_count" : {
2023-10-17T06:17:03.0413337Z     "dropped" : 0,
2023-10-17T06:17:03.0413429Z     "started" : 0
2023-10-17T06:17:03.0413508Z   },
2023-10-17T06:17:03.0413637Z   "dropped_spans_stats" : [ ],
2023-10-17T06:17:03.0413738Z   "sample_rate" : 1.0,
2023-10-17T06:17:03.0413843Z   "sampled" : true
2023-10-17T06:17:03.0413923Z }
2023-10-17T06:17:03.0414510Z 	at co.elastic.apm.agent.MockReporter.verifyJsonSchema(MockReporter.java:406) ~[test-classes/:?]
2023-10-17T06:17:03.0415099Z 	at co.elastic.apm.agent.MockReporter.verifyJsonSchemas(MockReporter.java:398) ~[test-classes/:?]
2023-10-17T06:17:03.0415732Z 	at co.elastic.apm.agent.MockReporter.verifyTransactionSchema(MockReporter.java:338) ~[test-classes/:?]
2023-10-17T06:17:03.0416228Z 	at co.elastic.apm.agent.MockReporter.report(MockReporter.java:230) ~[test-classes/:?]
2023-10-17T06:17:03.0416769Z 	at co.elastic.apm.agent.impl.ElasticApmTracer.endTransaction(ElasticApmTracer.java:514) ~[classes/:?]
2023-10-17T06:17:03.0417282Z 	at co.elastic.apm.agent.impl.transaction.Transaction.afterEnd(Transaction.java:285) ~[classes/:?]
2023-10-17T06:17:03.0417763Z 	at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:580) ~[classes/:?]
2023-10-17T06:17:03.0418251Z 	at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:557) ~[classes/:?]
2023-10-17T06:17:03.0419074Z 	at co.elastic.apm.agent.pluginapi.AbstractSpanInstrumentation$EndInstrumentation$AdviceClass.end(AbstractSpanInstrumentation.java:223) ~[classes/:?]
2023-10-17T06:17:03.0419479Z 	at co.elastic.apm.api.AbstractSpanImpl.end(AbstractSpanImpl.java:126) ~[classes/:?]
2023-10-17T06:17:03.0420454Z 	at co.elastic.apm.api.BlockingQueueContextPropagationTest.testAsyncSpanDelegation(BlockingQueueContextPropagationTest.java:118) ~[test-classes/:?]

The problematic test manually sets the start-timestamp, but uses the agent built-in clock via end() for the end timestamp. Because those two methods could use two different "bases" for converting System.nanoTime() to micros, this potentially causes an end-timestamp which is lower than the start timestamp.

I assume we only observed this issue on windows because the precision of *nix System.currentTimeMillis() is higher and therefore makes this issue much less likely.

This PR adds a sanitization to the end-timestamp, as it is possible via manual instrumentation to give invalid timestamps.

Checklist

  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced
    • I have made corresponding changes to the documentation

@JonasKunz JonasKunz added ci:agent-integration Enables agent integration tests in build pipeline ci:windows Enables Windows build & tests labels Oct 17, 2023
@JonasKunz JonasKunz marked this pull request as ready for review October 17, 2023 11:13
@github-actions
Copy link

/test

@JonasKunz JonasKunz requested a review from a team October 17, 2023 11:55
Copy link
Member

@SylvainJuge SylvainJuge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch !

@JonasKunz JonasKunz added ready-to-merge and removed ci:windows Enables Windows build & tests ready-to-merge labels Oct 17, 2023
@JonasKunz
Copy link
Contributor Author

@SylvainJuge I've removed the test-on-windows label, because there are no other, new issues which are unrelated to this PR:
We are seeing several failures & flakes due to timeouts. I assume something must have changed with the underlying test infrastructure. I don't want this to block this PR, therefore I removed the label.

I'll come back to the windows tests next week and see if they are still having problems.

@JonasKunz JonasKunz merged commit 45657a0 into elastic:main Oct 17, 2023
13 of 18 checks passed
@JonasKunz JonasKunz deleted the timestamp-fix branch October 17, 2023 15:02
v1v added a commit to v1v/apm-agent-java that referenced this pull request Oct 27, 2023
* 'main' of https://github.com/elastic/apm-agent-java: (35 commits)
  buildkite: opentelemetry+elastic agent overhead benchmark on a weekly basis (elastic#3371)
  ci(action): remove unrequired action (elastic#3387)
  Bump org.springframework.boot:spring-boot-dependencies from 2.7.16 to 3.1.5 (elastic#3380)
  ci(action): archive benchmarks.jar (elastic#3386)
  remove noise (elastic#3385)
  Process otel benchmark (elastic#3384)
  Update tests for Jedis 5 (elastic#3382)
  added entry to community plugins (elastic#3383)
  added capturing s3 otel attributes from lamba S3Event (elastic#3364)
  Bump version.log4j from 2.12.4 to 2.21.0 (elastic#3366)
  Fix aws sdk instrumentation (elastic#3373)
  action: remove leftover from Jenkins (elastic#3368)
  Bump org.apache.logging.log4j:log4j-bom from 2.20.0 to 2.21.0 (elastic#3367)
  Bump io.micrometer:micrometer-core from 1.11.4 to 1.11.5 (elastic#3360)
  Add guard against invalid end timestamps (elastic#3363)
  Added otel to dependabot config, upgraded dependencies (elastic#3359)
  Bump version.byte-buddy from 1.14.8 to 1.14.9 (elastic#3361)
  Fix off-by-one error in tests which effectively disabled recycling validation (elastic#3357)
  Fix multiple spans being created for HTTPUrlConnection HEAD requests (elastic#3353)
  Bump org.ow2.asm:asm-tree from 9.5 to 9.6 (elastic#3349)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-java ci:agent-integration Enables agent integration tests in build pipeline ready-to-merge
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants