Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31181 Avoid wsStore unnecessary write locks #18256

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Jan 25, 2024

  • Delays createStore write request and avoids if store pre-exists

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

@rpastrana
Copy link
Member Author

Simple stress test success:

image

@rpastrana rpastrana requested a review from jakesmith January 25, 2024 21:50
@rpastrana
Copy link
Member Author

@jakesmith how impactful do you think this change is? and do you see any value in doing something similar in the set() method?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - please see comments

@@ -57,6 +59,11 @@ bool CDALIKVStore::createStore(const char * apptype, const char * storename, con
if (maxvalsize != 0)
apptree->setPropInt(DALI_KVSTORE_MAXVALSIZE_ATT, maxvalsize);

Owned<IRemoteConnection> conn = querySDS().connect(DALI_KVSTORE_PATH, myProcessSession(), RTM_LOCK_WRITE, SDS_LOCK_TIMEOUT_KVSTORE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this introduces a potential problem (via a race condition).
Multiple clients could reach this point, having all performed the read lock at line 28, and found the 'Store' not to exist. Then all perform this write lock (in sequence), and all perform the addPropTree, resulting in multiple ambiguous copies of the same named store.

The code will need to recheck after the write (exclusive) lock is established I think.
Also, it would be better to keep the existing connection (not release it), and use changeMode to switch from RTM_LOCK_READ to RTM_LOCK_WRITE, e.g.:
initially establish the read lock, check for the store's existence, then do a:

conn->changeMode(RTM_LOCK_WRITE);

@rpastrana rpastrana requested a review from jakesmith January 26, 2024 20:37
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - please see comments.

@@ -57,6 +58,15 @@ bool CDALIKVStore::createStore(const char * apptype, const char * storename, con
if (maxvalsize != 0)
apptree->setPropInt(DALI_KVSTORE_MAXVALSIZE_ATT, maxvalsize);

conn->changeMode(RTM_LOCK_WRITE, SDS_LOCK_TIMEOUT_KVSTORE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized there's a potential problem with this approach.
If 1 client is at this point, then this changeMode will succeed, however if >1 client is here, then they all hold a read lock, and no client will be able to obtain an [exclusive] write lock because others will still have read locks.

The implementation will need to relinquish the read lock before attempting to get the exclusive lock, e.g.:

    // clear read lock 1st, then establish a write lock and recheck
    conn.clear();
    conn.setown(querySDS().connect(DALI_KVSTORE_PATH, myProcessSession(), RTM_LOCK_WRITE, SDS_LOCK_TIMEOUT_KVSTORE));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I'd probably perform this rechecking before the 'apptree' creation code, i.e. immediately after line 39.

@rpastrana rpastrana requested a review from jakesmith February 5, 2024 20:55
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana - looks good. Some trivial comments, but mergeable as is.

LOG(MCuserInfo,"DALI Keystore createStore(): '%s' entry already exists", storename);
return false;
Owned<IPropertyTree> root = conn->getRoot();
if (root->hasProp(xpath.str()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/no need to change: linking root isn't necessary, i.e. condition could be:

if (conn->queryRoot()->hasProp(xpath))

LOG(MCuserInfo,"DALI Keystore createStore(): '%s' entry already exists", storename);
return false;
}

root->addPropTree("Store", LINK(apptree));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial/no need to change: trivially more efficient to do apptree.getClear() here.

{
LOG(MCuserInfo,"DALI Keystore createStore(): '%s' entry already exists", storename);
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18256 (comment)

I'd still probably move this to immediately after the write lock has established, i.e. before creating 'apptree' which will not be needed if this check establishes it's already there.

@ghalliday
Copy link
Member

@rpastrana please squash (and address jake's final comments if you want)

- Delays createStore write request and avoids if store pre-exists

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-31181-WsStoreAvoidWriteLockIfNotNeeded branch from ba70136 to 4ebbb38 Compare February 9, 2024 00:07
@rpastrana
Copy link
Member Author

rpastrana commented Feb 9, 2024

made code block shift suggested by Jake and squashed.
Logic tested via soapui
@ghalliday

@ghalliday
Copy link
Member

@rpastrana please can you add some detail to the jira about the potential impact (how likely, how often it would occur)

@rpastrana
Copy link
Member Author

@rpastrana please can you add some detail to the jira about the potential impact (how likely, how often it would occur)

Jira updated with requested information. If this information impacts your decision to merge, I suggest it becomes an explicit requirement for all jiras/pull requests.

For my curiosity, what decision is influenced by the info requested?

@ghalliday
Copy link
Member

Thanks for the extra information.
It is more of a general policy - adding information to the jira about bugs so that users can know what is being fixed, and reviewers can review them.
In this case if 1% of the time it needs a write lock it is obviously useful. If 100% then it is not. I don't know the code, or the context the code is called in so it is very helpful to have that information.

So a comment saying something like "99% of the time the store entry already exists, so avoid the write lock in this common case" is all I would be looking for.

@ghalliday ghalliday merged commit 2853a22 into hpcc-systems:candidate-9.4.x Feb 9, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants