-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scala future test case using multiple threads #1274
Conversation
Sometimes fails because current tracer != the tracer passed as parameter. Also sometimes num is different some the transaction name.
💔 Build Failed
Expand to view the summary
Build stats
Steps errorsExpand to view the steps failures
|
I will have a look later today :) Looks interesting! You are indeed correct the current scenario is only tested with 1 transaction. Although I've tested this with a dummy-application as well that has multiple transactions going on at the same time: https://github.com/milanvdm/scala-elastic-apm/tree/master/src/main/scala/me/milan/main/future |
def startFuture(num: Int): Future[(Transaction, Int)] = { | ||
Future { | ||
println(s"thread=${Thread.currentThread().getId}, trace=${tracer.currentTransaction()} before num=$num") | ||
val transaction = tracer.startRootTransaction(getClass.getClassLoader).withName("Transaction" + num).activate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transaction is never deactivated on the thread and can therefore leak into other operations.
A more sensible test scenario would be to start a transaction before doing the multi-threaded operations and check if the transaction successfully propagates to the futures executed in different threads. I've tried this out and it even works when commenting out the instrumentations in apm-agent-plugins/apm-scala-concurrent-plugin/src/main/resources/META-INF/services/co.elastic.apm.agent.bci.ElasticApmInstrumentation
. That's because Scala relies on the Java concurrent framework that we already instrument.
The instrumentations in this module are only needed when the context gets lost along the way which seems to be the case for some AsyncHttpClient callbacks.
val transaction = tracer.startRootTransaction(getClass.getClassLoader).withName("Transaction").activate()
val futures = (1 to 10).map(x => Future {
Thread.sleep(10)
println(s"thread=${Thread.currentThread().getId}, trace=${tracer.currentTransaction()}")
val x = transaction == tracer.currentTransaction()
assertEquals(transaction, tracer.currentTransaction())
(transaction, x)
})
val future = Future.sequence(futures)
val result = Await.result(future, 10.seconds).toList
transaction.deactivate().end()
assertEquals(result.forall(x => x._2), true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transaction is never deactivated on the thread and can therefore leak into other operations.
But if I deactivate it, it won't propagate... right?
How do I "activate" the transaction inside the future, so that it's passed on to the next .map() operation, but while deactivating it so that it's not passed on to unrelated operation that happens on the same thread?
Looks like your example is using a single transaction. What I'm trying to simulate is a "http server" scenario, which has multiple transactions corresponding to http requests concurrently. If I take your example and make it run that 10 times in different threads I get the same issue.
https://gist.github.com/henrikno/71e54b7c2d43e63936fd33fdd45411bb
Interestingly it happens less often with ForkJoinPool than with FixedThreadPool.
If the test is wrong, could you help explain how it should be used with multiple "requests"/threads handing different transactions concurrently (each of which will typically spawn nested futures to do the actual work)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test in the linked gist looks good. I had to increase the number of threads though. Seems like it does expose some edge cases where the context isn't properly transferred.
To make investigation easier, we'd ideally want a test that's a bit simpler but that still fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill spend some time on this tomorrow 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny I am not sure how the elastic-agent handles the following case for JavaFutures:
- Create multiple transactions (could be on different or the same threads) - https://gist.github.com/henrikno/71e54b7c2d43e63936fd33fdd45411bb#file-test-scala-L41
- After you have multiple transactions active, create a new
Future
- https://gist.github.com/henrikno/71e54b7c2d43e63936fd33fdd45411bb#file-test-scala-L43
In this case, the problem seems to be, how does the Future
from (2) know which transaction to use?
That's how I currently understand the issue show-cased on the Gist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can only be one transaction active on a thread at a time. I think that's maintained as each Future is either executed in its own thread concurrently or steals time from the main thread and executes synchronously.
how does the Future from (2) know which transaction to use?
The map
schedules the Future
for execution (by calling eventually Executor#execute
). Whichever transaction is active on the current thread when the Future
is scheduled is going to be restored when the Future
is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny That map
is not actually being called on any Future
. Therefore, it will actually create a new Future
and use whatever Transaction
is active at that point.
I think that is the exact problem here. Transactions are started in parallel but are not followed up on correctly to pass the correct context.
So somehow, we need to have Transactions created inside of a Future, actually be linked to that Future and continue the logic with map
, flatMap
on that Future.
I'm not sure if the above not already happens though, say: Future(startTransaction()).flatMap(logic)
maybe always runs on the same thread and use the same transaction. But Im not sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests, I have commented out the scala-specific Future
instrumentation. The tests do fail more often when doing that but when using a ForkJoinPool
, it works a bit more reliably.
Calling map
ends up calling Executor#execute
and executing a Future
ends up calling Runnable#run
. These methods are instrumented by the java-concurrent plugin which captures and restores the context. Something must be going wrong in some cases there.
So somehow, we need to have Transactions created inside of a Future, actually be linked to that Future and continue the logic with map, flatMap on that Future.
When transactions are activated, they are attached to the current thread. If on that thread, a future gets scheduled for execution via map
, the transaction gets associated with that future and will be restored when the future is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny So since the Future that creates the transaction will be the same Runnable that executes the underlying Futures using that transaction, the context should be passed along correctly?
Thanks for the information as always :)
@felixbarny I've spent some time investigating this issue again :) Kamon's implementation has some nice descriptions on how and why they instrument certain methods to make the Future context propagation work correctly. Seems like currently the part of https://github.com/kamon-io/Kamon/blob/master/instrumentation/kamon-scala-future/src/main/scala-2.13/kamon/instrumentation/futures/scala/FutureChainingInstrumentation.scala#L39-L49 is implemented in the I tried implementing these parts but Im stuck on https://github.com/kamon-io/Kamon/blob/master/instrumentation/kamon-scala-future/src/main/scala-2.13/kamon/instrumentation/futures/scala/FutureChainingInstrumentation.scala#L56 where it cleans a context on a certain match. I cannot figure out how that should be done in the |
That's going to be tricky as the Elastic agent allows to be attached at runtime. At that point, the type initializer of Future has most likely already been executed. Also, the Kamon agent adds interfaces/mixins such as Maybe an alternative could be to avoid context propagation by checking if the future is (as in reference equality ( |
@felixbarny Been debugging this a bit more again :) I have some strange behavior that I have a question around: The following code is instrumenting a specific method:
But in the logs I get the following:
So even after calling |
You'll need to call |
@felixbarny Im not sure how the In the following test:
Since there are only rootTransactions started, shouldnt both methods always return the same transaction (as no spans are ever created as a child of the root-transaction)? The test is failing due to this. It is activating the correct context at the right times. So |
Seems you are activating multiple transactions that belong to different traces on the same thread. That's an illegal state but it seems we currently don't guard against that. What happens when calling apm-agent-java/apm-agent-core/src/main/java/co/elastic/apm/agent/impl/ElasticApmTracer.java Lines 252 to 255 in 48d697e
That means if you activate multiple transactions, the first activated one will be returned. |
@felixbarny It indeed looks like the spans are correctly passed across threads and being linked to the correct As you mention, the difference is with the So here I am a bit lost. I've got a solution that correctly activates and deactivates spans across different threads correctly for |
I didn't read through everything, but I hope I can help: If you run multiple futures on a thread pool, you must make sure any activated transaction is also deactivated before the thread is returned to the pool to run the next task. Otherwise, the next time you |
@henrikno a lot has changed since this was opened. Is this still relevant? Would you want to follow up on this, or should we close it for now? |
Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as |
Hi! This issue has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this issue if you think it should stay open. Thank you for your contribution! |
What does this PR do?
I wanted to give the scala futures implementation a spin but, hit a snag when using it with multiple threads.
It seems like the transaction is not activated/deactivated at the right time in some cases.
The existing test cases only use a single transaction so it wouldn't catch it.
Sometimes it fails because current tracer != the tracer passed as parameter.
Also sometimes
num
is different some the transaction name.Example output:
thread=35, trace='Transaction7' 00-852e1e7ca03813413465fc7c21b4ab5e-42a231f21779d2aa-01 (4cd1e659) start transaction num=20
num
should be the same as the number in the transaction name. It looks like the transaction from one future is still active when starting a different unrelated future.The test sometimes passes. The Thread.sleep is not necessary for it to fail but makes it happen more often.
Not intended to be merged as-is, just to show the issue.
Checklist