-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eclipselink 2.7.x Tracing metadata - bug in method org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String) - We end up with metadata on threads wanting to acquire write locks #2173
Comments
This issue is now documented under a service request. Thanks a lot. |
Side note: The 2.7.15 tag appears to address a defect related to the Additionally, here's the direct link to the specific commit:
I am some hard time following the changes on the method.
Whereas I would have expected a lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading As I had explained in point 7. Also the new piece of code doing: toWaitOnLock.lock() And later toWaitOnLock.unlock(); Looks like a radical change to me in the method implementation. I cannot judge if that code is actually correct or not. All I can see I was expecting the method implementation to look more like this: public Map acquireLocksForClone(Object objectForClone, ClassDescriptor descriptor, CacheKey cacheKey, AbstractSession cloningSession) {
// TRK-19750 - basic variable initialization to be able to do the
// determineIfReleaseDeferredLockAppearsToBeDeadLocked
final Date whileStartDate = new Date();
final Thread currentThread = Thread.currentThread();
DeferredLockManager lockManager = ConcurrencyManager.getDeferredLockManager(currentThread);
HackingEclipseReadLockManager readLockManager = ConcurrencyManager.getReadLockManager(currentThread);
boolean successful = false;
IdentityHashMap lockedObjects = new IdentityHashMap();
IdentityHashMap refreshedObjects = new IdentityHashMap();
CacheKey lastCacheKeyWeNeededToWaitToAcquire = null;
try {
// if the descriptor has indirection for all mappings then wait as there will be no deadlock risks
CacheKey toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
int tries = 0;
while (toWaitOn != null) {// loop until we've tried too many times.
for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
((CacheKey)lockedList.next()).releaseReadLock();
lockedList.remove();
}
// TRK-19750 - populate the static hash map
// of the concurrenyc manager that we use for creating the massive log dump
// to indicate that the current thread is now stuck trying to acquire some arbitrary
// cache key for writing
lastCacheKeyWeNeededToWaitToAcquire = toWaitOn;
lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
"org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)");
// TRK-19750 - Since we know this one of those methods that can appear in the dead locks
// we threads frozen here forever inside of the wait that used to have no timeout
// we will now always check for how long the current thread is stuck in this while loop going nowhere
// using the exact same approach we have been adding to the concurrency manager
HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(toWaitOn,
whileStartDate, lockManager, readLockManager,
WRITE_LOCK_MANAGER_IS_WILLING_TO_ALLOW_INTERRUPTED_EXCEPTION_TO_BE_FIRED_UP_IF_CONFIGURATION_WOULD_ALLOW_ID_TRUE);
synchronized (toWaitOn) {
try {
if (toWaitOn.isAcquired()) {//last minute check to insure it is still locked.
// TRK-19750
// this is the famous WriteLockManager.acquireLocksForClone(WriteLockManager.java:92)
// being one of the spots where threads trying build objects can get stuck forever
// commenting out wait without a timeout
// if the thread that has the lock with write permissions is in a dead lock
// then we are not coming out
// toWaitOn.wait();// wait for lock on object to be released
// TRK-19750
// at wait with timout like everywhere else waiting without timeout is always wrong
toWaitOn.wait(10000l);
}
} catch (InterruptedException ex) {
// Ignore exception thread should continue.
}
}
Object waitObject = toWaitOn.getObject();
// Object may be null for loss of identity.
if (waitObject != null) {
cloningSession.checkAndRefreshInvalidObject(waitObject, toWaitOn, cloningSession.getDescriptor(waitObject));
refreshedObjects.put(waitObject, waitObject);
}
toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
if ((toWaitOn != null) && ((++tries) > MAXTRIES)) {
// If we've tried too many times abort.
throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
}
}
successful = true;//successfully acquired all locks
} catch (InterruptedException exception) {
// TRK-19750 - if determineIfReleaseDeferredLockAppearsToBeDeadLocked is blowing up a thread stuck for too
// long
// run the lock of freeing up locks acquired by the thread
// NOTE: we would be tempted to do this commented code bellow
// cacheKey.releaseAllLocksAquiredByThread(lockManager, readLockManager);
// throw ConcurrencyException.waitFailureOnClientSession(exception);
// Instead what we do is just mimic the vanila behavior we have for the interrupted exception inside of the
// wait
// we must assume this is correct behavior
throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
} finally {
// TRK-19750 - remove from the static hash map
// of the concurrency manager that we use for creating the massive log dump
// any information we may have added that this thread was strugling to acquire any particular
// cache key. The current thread is out of the wait to acquire logic now so we can consider the thread
// as not no longer being stuck
if (lastCacheKeyWeNeededToWaitToAcquire != null) {
lastCacheKeyWeNeededToWaitToAcquire.removeCurrentThreadNoLongerWaitingToAcquireLockForReading();
}
if (!successful) {//did not acquire locks but we are exiting
for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
((CacheKey)lockedList.next()).releaseReadLock();
lockedList.remove();
}
}
}
return lockedObjects;
} I would assume the changes in the: Are correct but I have some hard time seeing that is the case. |
Sorry but is it this issue still valid? Because methods and
looks OK to me. |
Hi, yes, the issue is still valid. org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)
lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
"org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)"); Even if the two individual methods you pointed to above are not themselves wrong, the implementation of the code in WriteLockManager is incorrect. What the org.eclipse.persistence.internal.helper.WriteLockManager.acquireLockAndRelatedLocks(Object, Map, Map, CacheKey, ClassDescriptor, AbstractSession) And this method is all about getting read locks, not write locks. So comment 5 is very pertinent. So this code is not valid: Moreover, that code is also losing the metadata of the cacheKey that was returned by the acquireLockAndRelatedLocks method. This metadata is crucial for identifying which cache key we are unable to acquire for reading, causing us to get stuck. To be clear, the WriteLockManager does not only acquire locks for reading. There are scenarios, such as in the massive dump I am analyzing, where the locks the WriteLockManager is trying to acquire are indeed for writing. Here is a snippet of a stack trace that illustrates this point: at org.eclipse.persistence.internal.helper.WriteLockManager.addCacheKeyToMapWriteLockManagerToCacheKeysThatCouldNotBeAcquired(WriteLockManager.java:739)
at org.eclipse.persistence.internal.helper.WriteLockManager.acquireRequiredLocksInternal(WriteLockManager.java:470)
at org.eclipse.persistence.internal.helper.WriteLockManager.acquireRequiredLocks(WriteLockManager.java:335)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1712)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1677)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.issueSQLbeforeCompletion(RepeatableWriteUnitOfWork.java:366)
at org.eclipse.persistence.transaction.AbstractSynchronizationListener.beforeCompletion(AbstractSynchronizationListener.java:160)
at org.eclipse.persistence.transaction.JTASynchronizationListener.beforeCompletion(JTASynchronizationListener.java:70)
at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletion(ServerSCInfo.java:1425)
at weblogic.transaction.internal.ServerSCInfo.doBeforeCompletion(ServerSCInfo.java:1394)
at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletionsForTier(ServerSCInfo.java:1362)
at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletions(ServerSCInfo.java:1333)
at weblogic.transaction.internal.ServerSCInfo.startPrePrepareAndChain(ServerSCInfo.java:117)
at weblogic.transaction.internal.ServerTransactionImpl.localPrePrepareAndChain(ServerTransactionImpl.java:1677)
at weblogic.transaction.internal.ServerTransactionImpl.globalPrePrepare(ServerTransactionImpl.java:2540)
at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:303)
at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:270) In the above stack trace, we see WebLogic issuing the event before completion, and the changes in the L1 cache are being ported to the L2 cache. Here, EclipseLink does try to acquire cache keys for writing to bring in these changes. That being said, this is not what the WriteLockManager is doing in the |
Hello please check my PR #2304 against master branch. |
Bug Report: Issue with
putThreadAsWaitingToAcquireLockForWriting
Method in EclipseLink 2.7.xProblem Description:
putThreadAsWaitingToAcquireLockForWriting
method in EclipseLink 2.7.x is incorrect.Affected Method:
org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
Notice how this method is putting the trace metadata and then immediately removing it right after.
This how this method is looking like in the inhoud modified 2.6.4 version.
Missing Removal Method:
Efficiency Concerns:
org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone
) to construct the trace string is inefficient.stackTraceElement.getClassName() + "." + stackTraceElement.getMethodName() + "(...)"
should either be cached in a static variable or computed only once.Here is a preview of what the method currently looks like:
We are currently analyzing a new massive dump. This is a completely new pattern of deadlock we are investigating.
Different from the MergeManager deadlock for which we have a different open issue.
The current reason for the deadlock is still being investigated.
In this massive dump we had spotted a thread that had this stack trace
As you can see in the stack trace above , the thread itself that is generative the massive dump was supposed to have create some tracing metadata.
The metadata was nowhere to be found.
7. The Metadata Information Might Also Be Incorrect - Write Lock Manager Is Trying to Acquire a READ Lock Key, Not a WRITE Lock Key:
Another bug here pertains to the nature of the metadata. As I explained earlier, the implementation of the
org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
method is flawed. However, there's a second point to consider.In the old manipulated 2.6.4 code, the metadata we were associating with the WriteLock manager was related to its attempt to acquire a CacheKey for writing. However, in the original code, we used the method
putCurrentThreadAsWaitingToAcquireLockForReading
.I believe the old manipulated code is technically correct in distinguishing between READ lock and WRITE lock metadata information. Why? Let's examine the method:
As we can see from the method above, we are using
acquireReadLockNoWait
. This choice aligns with more accurate metadata.Addressing this issue will enhance performance and assist in our ongoing analysis of production issues. Thank you for your attention to this matter.
Note 1:
We will provide Oracle with the in-house manipulated 2.6.x source code. This will allow them to compare the implementation of metadata acquisition and calls to the following method:
against the current 2.7.x version.
Note 2:
I am also attaching a snippet of the manipulated 2.6.x classes for
WriteLockManager
andConcurrencyManager.java
. This snippet will facilitate the analysis of the mentioned methods.Please note that not all ongoing fixes in the 2.7.x version have been back-merged into our old, patched 2.6.x code. Back-merging recent developments from 2.7.x to 2.6.x is currently a low-priority task. Nevertheless, the attached code is relevant in the context of this defect.
WriteLockManager_2_6_4.txt
ConcurrencyManager_2_6_4.txt
The text was updated successfully, but these errors were encountered: