Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eclipselink 2.7.x Tracing metadata - bug in method org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String) - We end up with metadata on threads wanting to acquire write locks #2173

Open
99sono opened this issue Jun 13, 2024 · 5 comments · May be fixed by #2304

Comments

@99sono
Copy link

99sono commented Jun 13, 2024

Bug Report: Issue with putThreadAsWaitingToAcquireLockForWriting Method in EclipseLink 2.7.x

  1. Problem Description:

    • The implementation of the putThreadAsWaitingToAcquireLockForWriting method in EclipseLink 2.7.x is incorrect.
    • The method adds trace metadata about a thread wanting to acquire a write lock but immediately removes that same trace information.
    • Additionally, the library lacks the corresponding method for removing a thread no longer waiting to acquire a lock for writing.
  2. Affected Method:

    • org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
     /**
      * Normally this mehtod should only be called from withing the concurrency manager.
      * However the write lock manager while it is building clones also does some while loop waiting
      * to try to acquire a cache key this acquiring logic is not being managed directly inside of the wait manager.
      *
      */
     public void putThreadAsWaitingToAcquireLockForWriting(Thread thread, String methodName) {
         THREADS_TO_WAIT_ON_ACQUIRE_NAME_OF_METHOD_CREATING_TRACE.put(thread, methodName);
         THREADS_TO_WAIT_ON_ACQUIRE_NAME_OF_METHOD_CREATING_TRACE.remove(thread);
     }

    Notice how this method is putting the trace metadata and then immediately removing it right after.

    This how this method is looking like in the inhoud modified 2.6.4 version.

     /**
      * Normally this mehtod should only be called from withing the concurrency manager. However the write lock manager
      * while it is building clones also does some while loop waiting to try to acquire a cache key this acquiring logic
      * is not being managed directly inside of the wait manager.
      *
      * @param methodNamCreatingTheTrace
      *            metadata to help us debug trace leaking. If we start blowing up threads we do not want the traces
      *            created by the current thread to remain.
      */
     public void putCurrentThreadAsWaitingToAcquireLockForWriting(String methodNamCreatingTheTrace) {
         Thread currentThread = Thread.currentThread();
         MAP_THREAD_TO_WAIT_ON_ACQUIRE.put(currentThread, this);
         MAP_THREAD_TO_WAIT_ON_ACQUIRE_NAME_OF_METHOD_CREATING_TRACE.put(currentThread, methodNamCreatingTheTrace);
     }
  3. Missing Removal Method:

    • The library lacks the following method for removing a thread no longer waiting to acquire a lock for writing:
      public void removeCurrentThreadNoLongerWaitingToAcquireLockForWriting() {
          Thread currentThread = Thread.currentThread();
          MAP_THREAD_TO_WAIT_ON_ACQUIRE.remove(currentThread);
          MAP_THREAD_TO_WAIT_ON_ACQUIRE_NAME_OF_METHOD_CREATING_TRACE.remove(currentThread);
      }
  4. Efficiency Concerns:

    • The current approach used by callers (such as org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone) to construct the trace string is inefficient.
    • The string stackTraceElement.getClassName() + "." + stackTraceElement.getMethodName() + "(...)" should either be cached in a static variable or computed only once.

Here is a preview of what the method currently looks like:

                // of the concurrency manager that we use for creating the massive log dump
                // to indicate that the current thread is now stuck trying to acquire some arbitrary
                // cache key for writing
                StackTraceElement stackTraceElement = Thread.currentThread().getStackTrace()[1];
                lastCacheKeyWeNeededToWaitToAcquire = toWaitOn;
                lastCacheKeyWeNeededToWaitToAcquire.putThreadAsWaitingToAcquireLockForWriting(currentThread, stackTraceElement.getClassName() + "." + stackTraceElement.getMethodName() + "(...)");
  1. Proposed Solution:
    • Revert to the original implementation used in our old Inhouse Patch Fixed EclipseLink 2.6.x version for adding and removing thread trace information.
    • Consider optimizing the trace string construction to improve efficiency.
  lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
                        "org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)");
             
  1. How We spotted this bug:
    • We are currently analyzing a new massive dump. This is a completely new pattern of deadlock we are investigating.
      Different from the MergeManager deadlock for which we have a different open issue.
      The current reason for the deadlock is still being investigated.

    • In this massive dump we had spotted a thread that had this stack trace

[ACTIVE] ExecuteThread: '117' for queue: 'weblogic.kernel.Default (self-tuning)'" 
   java.lang.Thread.State: RUNNABLE
        at [email protected]/java.util.concurrent.ConcurrentHashMap.hashCode(ConcurrentHashMap.java:1302)
        at [email protected]/java.util.concurrent.ConcurrentHashMap.hashCode(ConcurrentHashMap.java:1302)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:651)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:584)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeFields(ThreadSnapshot.java:836)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeChildren(ThreadSnapshot.java:735)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:662)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:584)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeFields(ThreadSnapshot.java:836)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeChildren(ThreadSnapshot.java:735)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:662)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:584)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeFields(ThreadSnapshot.java:836)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeChildren(ThreadSnapshot.java:735)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:662)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:584)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeFields(ThreadSnapshot.java:836)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeChildren(ThreadSnapshot.java:735)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:662)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.writeVariable(ThreadSnapshot.java:584)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.access$100(ThreadSnapshot.java:40)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot$1$1.variable(ThreadSnapshot.java:408)
        at com.intergral.fusionreactor.plugin.jvmti.api.JvmtiApi.listVariables(Native Method)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot$1.frame(ThreadSnapshot.java:322)
        at com.intergral.fusionreactor.plugin.jvmti.api.JvmtiApi.listStack(Native Method)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshot.log(ThreadSnapshot.java:258)
        at com.intergral.fusionreactor.plugin.jvmti.thread.snapshot.ThreadSnapshotHandler.createThreadSnapshot(ThreadSnapshotHandler.java:383)
        at com.intergral.fusionreactor.plugin.logtracker.LogTrackerComponent.log(LogTrackerComponent.java:171)
        at com.intergral.fusionreactor.plugin.logtracker.LogTrackerImpl.fireLogEvent(LogTrackerImpl.java:58)
        at com.intergral.fusionreactor.plugin.logtracker.LogTrackerImpl.error(LogTrackerImpl.java:24)
        at java.mixins.fusionreactor.api.log.LogUtils.trackLogError(LogUtils.java:41)
        at com.intergral.fusionreactor.agent.pointcuts.logtracker.SLF4JPointCut$LoggingLevels$1.doLog(SLF4JPointCut.java:206)
        at com.intergral.fusionreactor.agent.pointcuts.logtracker.SLF4JPointCut.getAdvice(SLF4JPointCut.java:179)
        at com.intergral.fusionreactor.agent.advice.factory.AdviceFactory.getAdvice(AdviceFactory.java:64)
        at com.intergral.fusionreactor.agent.advice.factory.AdviceFactory.invoke(AdviceFactory.java:42)
        at org.apache.logging.slf4j.Log4jLogger.error(Log4jLogger.java)
        at com.whatever.eclipselink.Slf4jSessionLog.basicLog(Slf4jSessionLog.java:134)
        at com.whatever..eclipselink.Slf4jSessionLog.log(Slf4jSessionLog.java:109)
        at org.eclipse.persistence.logging.AbstractSessionLog.log(AbstractSessionLog.java:579)
		
		
		NOTE: This is where it gets interesting:
		This thread WriteLockManager.acquireLocksForClone has no tracing metadata in the 
		quote :Concurrency manager - Page 02 start - information about threads waiting to acquire (write/deferred) cache keys 
		This is because of the incorret implementation of the org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String)
		
        at org.eclipse.persistence.internal.helper.ConcurrencyUtil.currentThreadIsStuckForSomeTimeProduceTinyLogMessage(ConcurrencyUtil.java:567)
        at org.eclipse.persistence.internal.helper.ConcurrencyUtil.determineIfReleaseDeferredLockAppearsToBeDeadLocked(ConcurrencyUtil.java:235)
        at org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(WriteLockManager.java:171)
		
		
		
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1056)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1014)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:211)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:139)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:4093)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:4016)
        at org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:313)
        at org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:326)
        at org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:217)
        at org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:228)
        at org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder.buildCloneFor(UnitOfWorkQueryValueHolder.java:62)
        at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:177)
        at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:238)
        at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:97)
        at org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:275)
        at org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:458)
        at org.eclipse.persistence.indirection.IndirectList.toArray(IndirectList.java:864)
        at [email protected]/java.util.Collections$UnmodifiableCollection.toArray(Collections.java:1037)
        at [email protected]/java.util.ArrayList.addAll(ArrayList.java:702)
        at com.whatever.entity.WhateverEntity.getSomeWhateverOneToManyField(WhateverEntity.java:616)
		
		... rest of the stack trace is censored. 

As you can see in the stack trace above , the thread itself that is generative the massive dump was supposed to have create some tracing metadata.
The metadata was nowhere to be found.

7. The Metadata Information Might Also Be Incorrect - Write Lock Manager Is Trying to Acquire a READ Lock Key, Not a WRITE Lock Key:

  • Another bug here pertains to the nature of the metadata. As I explained earlier, the implementation of the org.eclipse.persistence.internal.helper.ConcurrencyManager.putThreadAsWaitingToAcquireLockForWriting(Thread, String) method is flawed. However, there's a second point to consider.

  • In the old manipulated 2.6.4 code, the metadata we were associating with the WriteLock manager was related to its attempt to acquire a CacheKey for writing. However, in the original code, we used the method putCurrentThreadAsWaitingToAcquireLockForReading.

  • I believe the old manipulated code is technically correct in distinguishing between READ lock and WRITE lock metadata information. Why? Let's examine the method:

public CacheKey acquireLockAndRelatedLocks(Object objectForClone, Map lockedObjects, Map refreshedObjects, CacheKey cacheKey, ClassDescriptor descriptor, AbstractSession cloningSession) {
    if (!refreshedObjects.containsKey(objectForClone) && cloningSession.isConsideredInvalid(objectForClone, cacheKey, descriptor)) {
        return cacheKey;
    }
    // Attempt to get a read-lock; null is returned if it cannot be read-locked.
    if (cacheKey.acquireReadLockNoWait()) {
        if (cacheKey.getObject() == null) {
            // This will be the case for deleted objects, NoIdentityMap, and aggregates.
            lockedObjects.put(objectForClone, cacheKey);
        } else {
            objectForClone = cacheKey.getObject();
            if (lockedObjects.containsKey(objectForClone)) {
                // This check is for loss of identity; the original check in checkAndLockObject() will short-circuit in the usual case.
                cacheKey.releaseReadLock();
                return null;
            }
            // Store the locked cache key for release later.
            lockedObjects.put(objectForClone, cacheKey);
        }
        return traverseRelatedLocks(objectForClone, lockedObjects, refreshedObjects, descriptor, cloningSession);
    } else {
        // Return the cache key that could not be locked.
        return cacheKey;
    }
}

As we can see from the method above, we are using acquireReadLockNoWait. This choice aligns with more accurate metadata.

  1. Additional Notes:
    Addressing this issue will enhance performance and assist in our ongoing analysis of production issues. Thank you for your attention to this matter.

Note 1:
We will provide Oracle with the in-house manipulated 2.6.x source code. This will allow them to compare the implementation of metadata acquisition and calls to the following method:

org.eclipse.persistence.internal.helper.ConcurrencyManager.removeCurrentThreadNoLongerWaitingToAcquireLockForWriting()

against the current 2.7.x version.

Note 2:
I am also attaching a snippet of the manipulated 2.6.x classes for WriteLockManager and ConcurrencyManager.java. This snippet will facilitate the analysis of the mentioned methods.

Please note that not all ongoing fixes in the 2.7.x version have been back-merged into our old, patched 2.6.x code. Back-merging recent developments from 2.7.x to 2.6.x is currently a low-priority task. Nevertheless, the attached code is relevant in the context of this defect.

WriteLockManager_2_6_4.txt
ConcurrencyManager_2_6_4.txt

@99sono
Copy link
Author

99sono commented Jun 21, 2024

This issue is now documented under a service request. Thanks a lot.

@99sono
Copy link
Author

99sono commented Jul 3, 2024

Side note: The 2.7.15 tag appears to address a defect related to the ConcurrencyManager. You can find the relevant changes in the following commits:

  1. Update ConcurrencyManager implementation to be in sync with master
  2. Pull request #1347

Additionally, here's the direct link to the specific commit:
Commit details

  1. The code in the Write Lock manager looks to have been modified:
    [2.7.15 write lock manager with modified acquireLocksForClone] (https://github.com/eclipse-ee4j/eclipselink/blob/2.7.15/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/WriteLockManager.java)

I am some hard time following the changes on the method.
the changes look bigger than i anticipated.
I see that the method continues to do a:

putThreadAsWaitingToAcquireLockForWriting

Whereas I would have expected a

lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading

As I had explained in point 7.

Also the new piece of code doing:

 toWaitOnLock.lock()

And later

 toWaitOnLock.unlock();

Looks like a radical change to me in the method implementation.

I cannot judge if that code is actually correct or not.

All I can see I was expecting the method implementation to look more like this:

 public Map acquireLocksForClone(Object objectForClone, ClassDescriptor descriptor, CacheKey cacheKey, AbstractSession cloningSession) {
        // TRK-19750 - basic variable initialization to be able to do the
        // determineIfReleaseDeferredLockAppearsToBeDeadLocked
        final Date whileStartDate = new Date();
        final Thread currentThread = Thread.currentThread();
        DeferredLockManager lockManager = ConcurrencyManager.getDeferredLockManager(currentThread);
        HackingEclipseReadLockManager readLockManager = ConcurrencyManager.getReadLockManager(currentThread);

        boolean successful = false;
        IdentityHashMap lockedObjects = new IdentityHashMap();
        IdentityHashMap refreshedObjects = new IdentityHashMap();

        CacheKey lastCacheKeyWeNeededToWaitToAcquire = null;
        try {
            // if the descriptor has indirection for all mappings then wait as there will be no deadlock risks
            CacheKey toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
            int tries = 0;
            while (toWaitOn != null) {// loop until we've tried too many times.

                for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
                    ((CacheKey)lockedList.next()).releaseReadLock();
                    lockedList.remove();
                }

                // TRK-19750 - populate the static hash map
                // of the concurrenyc manager that we use for creating the massive log dump
                // to indicate that the current thread is now stuck trying to acquire some arbitrary
                // cache key for writing
                lastCacheKeyWeNeededToWaitToAcquire = toWaitOn;
                lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
                        "org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)");

                // TRK-19750 - Since we know this one of those methods that can appear in the dead locks
                // we threads frozen here forever inside of the wait that used to have no timeout
                // we will now always check for how long the current thread is stuck in this while loop going nowhere
                // using the exact same approach we have been adding to the concurrency manager
                HackingEclipseHelperUtil.SINGLETON.determineIfReleaseDeferredLockAppearsToBeDeadLocked(toWaitOn,
                        whileStartDate, lockManager, readLockManager,
                        WRITE_LOCK_MANAGER_IS_WILLING_TO_ALLOW_INTERRUPTED_EXCEPTION_TO_BE_FIRED_UP_IF_CONFIGURATION_WOULD_ALLOW_ID_TRUE);

                synchronized (toWaitOn) {
                    try {
                        if (toWaitOn.isAcquired()) {//last minute check to insure it is still locked.
                            // TRK-19750
                            // this is the famous WriteLockManager.acquireLocksForClone(WriteLockManager.java:92)
                            // being one of the spots where threads trying build objects can get stuck forever
                            // commenting out wait without a timeout
                            // if the thread that has the lock with write permissions is in a dead lock
                            // then we are not coming out
                            // toWaitOn.wait();// wait for lock on object to be released

                            // TRK-19750
                            // at wait with timout like everywhere else waiting without timeout is always wrong
                            toWaitOn.wait(10000l);
                        }
                    } catch (InterruptedException ex) {
                        // Ignore exception thread should continue.
                    }
                }
                Object waitObject = toWaitOn.getObject();
                // Object may be null for loss of identity.
                if (waitObject != null) {
                    cloningSession.checkAndRefreshInvalidObject(waitObject, toWaitOn, cloningSession.getDescriptor(waitObject));
                    refreshedObjects.put(waitObject, waitObject);
                }
                toWaitOn = acquireLockAndRelatedLocks(objectForClone, lockedObjects, refreshedObjects, cacheKey, descriptor, cloningSession);
                if ((toWaitOn != null) && ((++tries) > MAXTRIES)) {
                    // If we've tried too many times abort.
                    throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
                }
            }
            successful = true;//successfully acquired all locks
        } catch (InterruptedException exception) {
            // TRK-19750 - if determineIfReleaseDeferredLockAppearsToBeDeadLocked is blowing up a thread stuck for too
            // long
            // run the lock of freeing up locks acquired by the thread
            // NOTE: we would be tempted to do this commented code bellow
            // cacheKey.releaseAllLocksAquiredByThread(lockManager, readLockManager);
            // throw ConcurrencyException.waitFailureOnClientSession(exception);
            // Instead what we do is just mimic the vanila behavior we have for the interrupted exception inside of the
            // wait
            // we must assume this is correct behavior
            throw ConcurrencyException.maxTriesLockOnCloneExceded(objectForClone);
        } finally {
            // TRK-19750 - remove from the static hash map
            // of the concurrency manager that we use for creating the massive log dump
            // any information we may have added that this thread was strugling to acquire any particular
            // cache key. The current thread is out of the wait to acquire logic now so we can consider the thread
            // as not no longer being stuck
            if (lastCacheKeyWeNeededToWaitToAcquire != null) {
                lastCacheKeyWeNeededToWaitToAcquire.removeCurrentThreadNoLongerWaitingToAcquireLockForReading();
            }

            if (!successful) {//did not acquire locks but we are exiting
                for (Iterator lockedList = lockedObjects.values().iterator(); lockedList.hasNext();) {
                    ((CacheKey)lockedList.next()).releaseReadLock();
                    lockedList.remove();
                }
            }
        }
        return lockedObjects;
    }

I would assume the changes in the:
https://github.com/eclipse-ee4j/eclipselink/blob/2.7.15/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/WriteLockManager.java

Are correct but I have some hard time seeing that is the case.

@rfelcman
Copy link
Contributor

Sorry but is it this issue still valid? Because methods
public void putThreadAsWaitingToAcquireLockForWriting(Thread thread, String methodName)
https://github.com/eclipse-ee4j/eclipselink/blob/2.7/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/ConcurrencyManager.java#L915

and

public void removeThreadNoLongerWaitingToAcquireLockForWriting(Thread thread)
https://github.com/eclipse-ee4j/eclipselink/blob/2.7/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/ConcurrencyManager.java#L924

looks OK to me.

@99sono
Copy link
Author

99sono commented Oct 29, 2024

Hi, yes, the issue is still valid.
Please see the feedback on the

org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)
  1. Proposed Solution:
    • Revert to the original implementation used in our old in-house patched version of EclipseLink 2.6.x for adding and removing thread trace information.
    • Consider optimizing the trace string construction to improve efficiency.
lastCacheKeyWeNeededToWaitToAcquire.putCurrentThreadAsWaitingToAcquireLockForReading(
    "org.eclipse.persistence.internal.helper.WriteLockManager.acquireLocksForClone(Object, ClassDescriptor, CacheKey, AbstractSession)");

Even if the two individual methods you pointed to above are not themselves wrong, the implementation of the code in WriteLockManager is incorrect. What the acquireLocksForClone method is trying to do is not acquiring locks for writing. The code that actually does lock acquisition called upon by acquireLocksForClone is this method of the WriteLockManager:

org.eclipse.persistence.internal.helper.WriteLockManager.acquireLockAndRelatedLocks(Object, Map, Map, CacheKey, ClassDescriptor, AbstractSession)

And this method is all about getting read locks, not write locks. So comment 5 is very pertinent.

So this code is not valid:
https://github.com/eclipse-ee4j/eclipselink/blob/2.7/foundation/org.eclipse.persistence.core/src/org/eclipse/persistence/internal/helper/WriteLockManager.java#L173

Moreover, that code is also losing the metadata of the cacheKey that was returned by the acquireLockAndRelatedLocks method. This metadata is crucial for identifying which cache key we are unable to acquire for reading, causing us to get stuck.
The trace on the method where we are stuck trying to get a cache key for reading could mention the method name and probably add a suffix, "failed to acquire cachkeKey:" + toWaitOn

To be clear, the WriteLockManager does not only acquire locks for reading. There are scenarios, such as in the massive dump I am analyzing, where the locks the WriteLockManager is trying to acquire are indeed for writing. Here is a snippet of a stack trace that illustrates this point:

        at org.eclipse.persistence.internal.helper.WriteLockManager.addCacheKeyToMapWriteLockManagerToCacheKeysThatCouldNotBeAcquired(WriteLockManager.java:739)
        at org.eclipse.persistence.internal.helper.WriteLockManager.acquireRequiredLocksInternal(WriteLockManager.java:470)
        at org.eclipse.persistence.internal.helper.WriteLockManager.acquireRequiredLocks(WriteLockManager.java:335)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.acquireWriteLocks(UnitOfWorkImpl.java:1712)
        at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitTransactionAfterWriteChanges(UnitOfWorkImpl.java:1677)
        at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.issueSQLbeforeCompletion(RepeatableWriteUnitOfWork.java:366)
        at org.eclipse.persistence.transaction.AbstractSynchronizationListener.beforeCompletion(AbstractSynchronizationListener.java:160)
        at org.eclipse.persistence.transaction.JTASynchronizationListener.beforeCompletion(JTASynchronizationListener.java:70)
        at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletion(ServerSCInfo.java:1425)
        at weblogic.transaction.internal.ServerSCInfo.doBeforeCompletion(ServerSCInfo.java:1394)
        at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletionsForTier(ServerSCInfo.java:1362)
        at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletions(ServerSCInfo.java:1333)
        at weblogic.transaction.internal.ServerSCInfo.startPrePrepareAndChain(ServerSCInfo.java:117)
        at weblogic.transaction.internal.ServerTransactionImpl.localPrePrepareAndChain(ServerTransactionImpl.java:1677)
        at weblogic.transaction.internal.ServerTransactionImpl.globalPrePrepare(ServerTransactionImpl.java:2540)
        at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:303)
        at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:270)

In the above stack trace, we see WebLogic issuing the event before completion, and the changes in the L1 cache are being ported to the L2 cache. Here, EclipseLink does try to acquire cache keys for writing to bring in these changes.

That being said, this is not what the WriteLockManager is doing in the acquireLocksForClone method I previously mentioned. Therefore, the right trace should be placed for the right use case.

@rfelcman
Copy link
Contributor

Hello please check my PR #2304 against master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants