Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke blocks and do not time out if active fails and no server left #70

Open
mathieucarbou opened this issue Dec 13, 2016 · 3 comments
Assignees
Labels

Comments

@mathieucarbou
Copy link
Member

branch: https://github.com/mathieucarbou/terracotta-platform/tree/issue-171

test: HATest.failover()

Test content:

    // connect passive
    stripeControl.startOneServer();
    stripeControl.waitForRunningPassivesInStandby();
[...]
    // kill active - passive should take the active role
    stripeControl.terminateActive();
[...]

main thread is blocking in tearDown(), before closing client's connection, we send a message to the management entity (ManagementAgentService.pushNotification).

Full thread dump

"main@1" prio=5 tid=0x1 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughConnectionState.sendNormal(PassthroughConnectionState.java:53)
	  at org.terracotta.passthrough.PassthroughConnection.invokeAndWait(PassthroughConnection.java:154)
	  at org.terracotta.passthrough.PassthroughConnection.invokeActionAndWaitForAcks(PassthroughConnection.java:146)
	  at org.terracotta.passthrough.PassthroughInvocationBuilder.invoke(PassthroughInvocationBuilder.java:113)
	  at org.terracotta.voltron.proxy.client.VoltronProxyInvocationHandler.invoke(VoltronProxyInvocationHandler.java:103)
	  at com.sun.proxy.$Proxy18.pushNotification(Unknown Source:-1)
	  at org.terracotta.management.entity.management.client.ManagementAgentService.pushNotification(ManagementAgentService.java:202)
	  at org.terracotta.management.entity.sample.client.management.Management.close(Management.java:117)
	  at org.terracotta.management.entity.sample.client.CacheFactory.close(CacheFactory.java:96)
	  at org.terracotta.management.entity.sample.AbstractTest$$Lambda$78.1930903395.accept(Unknown Source:-1)
	  at java.util.ArrayList.forEach(ArrayList.java:1249)
	  at org.terracotta.management.entity.sample.AbstractTest.closeNodes(AbstractTest.java:170)
	  at org.terracotta.management.entity.sample.AbstractTest.tearDown(AbstractTest.java:130)
	  at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
	  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	  at java.lang.reflect.Method.invoke(Method.java:498)
	  at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	  at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	  at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	  at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
	  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	  at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	  at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	  at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	  at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
	  at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
	  at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

"ManagementScheduler-2@2516" daemon prio=5 tid=0x1d nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)

"pool-7-thread-1@2375" prio=5 tid=0x19 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)

"pool-5-thread-1@2295" prio=5 tid=0x17 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at sun.misc.Unsafe.park(Unsafe.java:-1)
	  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
	  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
	  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)

"Pseudo-connection 6@2497" prio=5 tid=0x1a nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughConnection.getNextClientMessage(PassthroughConnection.java:222)
	  at org.terracotta.passthrough.PassthroughConnection.handleNextMessage(PassthroughConnection.java:205)
	  at org.terracotta.passthrough.PassthroughConnection.runClientThread(PassthroughConnection.java:199)
	  at org.terracotta.passthrough.PassthroughConnection.access$000(PassthroughConnection.java:48)
	  at org.terracotta.passthrough.PassthroughConnection$1.run(PassthroughConnection.java:91)
	  at java.lang.Thread.run(Thread.java:745)

"Client connection 5@2374" prio=5 tid=0x18 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughConnection.getNextClientMessage(PassthroughConnection.java:222)
	  at org.terracotta.passthrough.PassthroughConnection.handleNextMessage(PassthroughConnection.java:205)
	  at org.terracotta.passthrough.PassthroughConnection.runClientThread(PassthroughConnection.java:199)
	  at org.terracotta.passthrough.PassthroughConnection.access$000(PassthroughConnection.java:48)
	  at org.terracotta.passthrough.PassthroughConnection$1.run(PassthroughConnection.java:91)
	  at java.lang.Thread.run(Thread.java:745)

"Client connection 4@2293" prio=5 tid=0x16 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughConnection.getNextClientMessage(PassthroughConnection.java:222)
	  at org.terracotta.passthrough.PassthroughConnection.handleNextMessage(PassthroughConnection.java:205)
	  at org.terracotta.passthrough.PassthroughConnection.runClientThread(PassthroughConnection.java:199)
	  at org.terracotta.passthrough.PassthroughConnection.access$000(PassthroughConnection.java:48)
	  at org.terracotta.passthrough.PassthroughConnection$1.run(PassthroughConnection.java:91)
	  at java.lang.Thread.run(Thread.java:745)

"Client connection 3@1993" prio=5 tid=0x14 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughConnection.getNextClientMessage(PassthroughConnection.java:222)
	  at org.terracotta.passthrough.PassthroughConnection.handleNextMessage(PassthroughConnection.java:205)
	  at org.terracotta.passthrough.PassthroughConnection.runClientThread(PassthroughConnection.java:199)
	  at org.terracotta.passthrough.PassthroughConnection.access$000(PassthroughConnection.java:48)
	  at org.terracotta.passthrough.PassthroughConnection$1.run(PassthroughConnection.java:91)
	  at java.lang.Thread.run(Thread.java:745)

"Thread-1@1725" prio=5 tid=0xd nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at org.terracotta.passthrough.PassthroughServerCrasher.waitNextRequest(PassthroughServerCrasher.java:57)
	  at org.terracotta.passthrough.PassthroughServerCrasher$1.run(PassthroughServerCrasher.java:32)

"PassthroughTimerThread@2498" prio=5 tid=0x1b nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at org.terracotta.passthrough.PassthroughTimerThread.run(PassthroughTimerThread.java:49)

"PassthroughTimerThread@1978" prio=5 tid=0x12 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at org.terracotta.passthrough.PassthroughTimerThread.run(PassthroughTimerThread.java:49)

"PassthroughTimerThread@1789" prio=5 tid=0xf nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at org.terracotta.passthrough.PassthroughTimerThread.run(PassthroughTimerThread.java:49)

"Finalizer@2587" daemon prio=8 tid=0x3 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
	  at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
	  at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler@2588" daemon prio=10 tid=0x2 nid=NA waiting
  java.lang.Thread.State: WAITING
	  at java.lang.Object.wait(Object.java:-1)
	  at java.lang.Object.wait(Object.java:502)
	  at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
	  at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"Signal Dispatcher@2586" daemon prio=9 tid=0x4 nid=NA runnable
  java.lang.Thread.State: RUNNABLE

It is possible that this is a bug on our site, like #63, but I do not know if passthrough is hiding a stack trace. I do not see any stacktrace in the console at least.

@mathieucarbou
Copy link
Member Author

mathieucarbou commented Dec 13, 2016

UPDATE: I have more details.

So the call to terminateActive() do the promotion of the passive, which then triggers a call to IStripeMonitoring.addNode()

Out code is throwing an exception in the addNode().

So Junit is recording the exception thrown by terminateActive().

Then, junit is calling tearDown(), which calling clients to close(). They send a message to management entity (which blocks because no active is there).

So all happens in the same (main) thread.

So in this case, the failover starts, but the passive is failing becoming active.

I would be able to see the Junit exception if the tearDown wouldn't block into assthroughConnectionState.sendNormal. Is there a way to make it non blocking or timeouting ?

@mathieucarbou mathieucarbou changed the title Cannot invoke anymore after terminating active Cannot invoke anymore if passive fails to become active Dec 14, 2016
@mathieucarbou
Copy link
Member Author

We put a workaround in our junit tearDown method. We execute all or calls through a runnable passed to this method:

  private void interruptVoltron(Runnable runnable) {
    CountDownLatch closed = new CountDownLatch(1);
    Thread thread = new Thread(() -> {
      try {
        runnable.run();
      } catch (RuntimeException e) {
        Throwable t = e.getCause();
        while (t != null && !(t instanceof InterruptedException)) {
          t = t.getCause();
        }
        if (t == null) {
          throw e;
        }
      }
      closed.countDown();
    });
    thread.start();
    try {
      if (!closed.await(2, TimeUnit.SECONDS)) {
        thread.interrupt();
      }
    } catch (InterruptedException e) {
      Thread.currentThread().interrupt();
    }
  }

@myronkscott
Copy link
Member

that's good.

@mathieucarbou mathieucarbou changed the title Cannot invoke anymore if passive fails to become active Cannot invoke anymore if active fails and no server left Dec 16, 2016
@mathieucarbou mathieucarbou changed the title Cannot invoke anymore if active fails and no server left Invoke blocks and do not time out if active fails and no server left Dec 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants