Fix Sitelists changes propagation to local workqueue #12245

todor-ivanov · 2025-01-29T08:47:17Z

Fixes #12244

Status

Ready

Description

With the current PR we:

Change the origin of the Workload object instance to load the spec file from local couch instead of the the workflow pkl file.
Fix the few parameters Names typos
Enlarge the Local WorkQueue Elements status mask for the elements to be updated
Remove redundant operations for extra calling the Workload setters method, since those already happen automatically as a part of the workqueue elements update step.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

#12123

External dependencies / deployment changes

None

dmwm-bot · 2025-01-29T09:04:08Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 changes in unstable tests
Python3 Pylint check: succeeded
- 2 warnings
- 17 comments to review
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/313/artifact/artifacts/PullRequestReport.html

todor-ivanov · 2025-01-29T17:38:46Z

And with our latest commit we confirm with @mapellidario that it all works now. Here [1] is a diff between the Local WorkQueuElements for the workflow dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803 for which we used the WEB interface to change the SiteBlackList and later we checked all the GWQE + LWQE. And the change was indeed propagated down to the agent.

[1]

In [22]: test.assertDictEqual(wqeListBefore, wqeListAfter)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[22], line 1
----> 1 test.assertDictEqual(wqeListBefore, wqeListAfter)

File /usr/lib64/python3.9/unittest/case.py:1136, in TestCase.assertDictEqual(self, d1, d2, msg)
   1132 diff = ('\n' + '\n'.join(difflib.ndiff(
   1133                pprint.pformat(d1).splitlines(),
   1134                pprint.pformat(d2).splitlines())))
   1135 standardMsg = self._truncateMessage(standardMsg, diff)
-> 1136 self.fail(self._formatMessage(msg, standardMsg))

File /usr/lib64/python3.9/unittest/case.py:676, in TestCase.fail(self, msg)
    674 def fail(self, msg=None):
    675     """Fail immediately, with the given message."""
--> 676     raise self.failureException(msg)

AssertionError: {'wor[704 chars]8169434, 'NoInputUpdate': False, 'NoPileupUpda[3824 chars]94}]} != {'wor[704 chars]8169489, 'NoInputUpdate': False, 'NoPileupUpda[3974 chars]94}]}
  {'workqueue': [{'ACDC': {},
                  'ChildQueueUrl': None,
-                 'CreationTime': 1738169434,
?                                         ^^

+                 'CreationTime': 1738169489,
?                                         ^^

                  'Dbs': None,
                  'EndPolicy': {},
                  'EventsWritten': 0,
                  'FilesProcessed': 0,
                  'Inputs': {},
                  'Jobs': 0,
                  'Mask': None,
                  'NoInputUpdate': False,
                  'NoPileupUpdate': False,
                  'NumOfFilesAdded': 0,
                  'NumberOfEvents': 0,
                  'NumberOfFiles': 0,
                  'NumberOfLumis': 0,
                  'OpenForNewData': False,
                  'ParentData': {},
                  'ParentFlag': False,
                  'ParentQueueId': None,
                  'ParentQueueUrl': None,
                  'PercentComplete': 0,
                  'PercentSuccess': 0,
                  'PileupData': {},
                  'Priority': 0,
                  'ProcessedInputs': [],
                  'RejectedInputs': [],
                  'RequestName': None,
                  'SiteBlacklist': [],
                  'SiteWhitelist': [],
                  'StartPolicy': {},
                  'Status': None,
                  'SubscriptionId': None,
                  'Task': None,
                  'TaskName': None,
                  'TeamName': None,
-                 'TimestampFoundNewData': 1738169434,
?                                                  ^^

+                 'TimestampFoundNewData': 1738169489,
?                                                  ^^

                  'WMBSUrl': None,
                  'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement': {'ACDC': {},
                                                                                     'ChildQueueUrl': None,
                                                                                     'CreationTime': 1737716401,
                                                                                     'Dbs': 'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader',
                                                                                     'EndPolicy': {'policyName': 'SingleShot'},
                                                                                     'EventsWritten': 0,
                                                                                     'FilesProcessed': 0,
                                                                                     'Inputs': {'/HLTPhysicsIsolatedBunch/Run2016H-v1/RAW#92049f6e-92f9-11e6-b150-001e67abf228': ['T2_CH_CERN_P5',
                                                                                                                                                                                  'T2_CH_CERN_HLT',
                                                                                                                                                                                  'T2_CH_CERN']},
                                                                                     'Jobs': 8,
                                                                                     'Mask': None,
                                                                                     'NoInputUpdate': False,
                                                                                     'NoPileupUpdate': False,
                                                                                     'NumOfFilesAdded': 0,
                                                                                     'NumberOfEvents': 3008,
                                                                                     'NumberOfFiles': 3,
                                                                                     'NumberOfLumis': 3,
                                                                                     'OpenForNewData': False,
                                                                                     'ParentData': {},
                                                                                     'ParentFlag': False,
                                                                                     'ParentQueueId': '3c36d2561fce25529b81e2045327b944',
                                                                                     'ParentQueueUrl': None,
                                                                                     'PercentComplete': 0,
                                                                                     'PercentSuccess': 0,
                                                                                     'PileupData': {},
                                                                                     'Priority': 190000,
                                                                                     'ProcessedInputs': [],
                                                                                     'RejectedInputs': [],
                                                                                     'RequestName': 'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803',
-                                                                                    'SiteBlacklist': ['T2_AT_Vienna',
?                                                                                                          ^^ ^^^^^^

+                                                                                    'SiteBlacklist': ['T2_FR_IPHC',
?                                                                                                          ^^ ^^^^

+                                                                                                      'T2_GR_Ioannina',
+                                                                                                      'T2_HU_Budapest',
+                                                                                                      'T2_IN_TIFR',
+                                                                                                      'T2_IT_Bari',
+                                                                                                      'T2_IT_Legnaro',
+                                                                                                      'T2_IT_Pisa',
+                                                                                                      'T2_IT_Rome',
-                                                                                                      'T2_BE_IIHE',
?                                                                                                          ^^   --

+                                                                                                      'T2_KR_KISTI'],
?                                                                                                          ^^ + ++  +

-                                                                                                      'T0_CH_CSCS_HPC'],
-                                                                                    'SiteWhitelist': ['T1_US_FNAL',
?                                                                                                          ^^ ^^^^

+                                                                                    'SiteWhitelist': ['T1_DE_KIT'],
?                                                                                                          ^^ ^^^ +

-                                                                                                      'T2_CH_CERN'],
                                                                                     'StartPolicy': 'Block',
                                                                                     'Status': 'Running',
                                                                                     'SubscriptionId': 844,
                                                                                     'TaskName': 'DataProcessing',
                                                                                     'TeamName': 'testbed-vocms0192',
                                                                                     'TimestampFoundNewData': 1737716401,
                                                                                     'WMBSUrl': None},
                  'WMSpec': None,
                  '_id': '3c36d2561fce25529b81e2045327b944',
-                 '_rev': '3-1d6b7ce2885cce5cd0a56c06d8deb254',
+                 '_rev': '4-d8ec4afb9c3197d50a4de4bd5bdb2096',
                  'thunker_encoded_json': True,
                  'timestamp': 1737716402.0188694,
                  'type': 'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement',
                  'updatetime': 1737716402.0188694}],
   'workqueue_inbox': [{'ACDC': {},
                        'ChildQueueUrl': None,
-                       'CreationTime': 1738169434,
?                                               ^^

+                       'CreationTime': 1738169489,
?                                               ^^

                        'Dbs': None,
                        'EndPolicy': {},
                        'EventsWritten': 0,
                        'FilesProcessed': 0,
                        'Inputs': {},
                        'Jobs': 0,
                        'Mask': None,
                        'NoInputUpdate': False,
                        'NoPileupUpdate': False,
                        'NumOfFilesAdded': 0,
                        'NumberOfEvents': 0,
                        'NumberOfFiles': 0,
                        'NumberOfLumis': 0,
                        'OpenForNewData': False,
                        'ParentData': {},
                        'ParentFlag': False,
                        'ParentQueueId': None,
                        'ParentQueueUrl': None,
                        'PercentComplete': 0,
                        'PercentSuccess': 0,
                        'PileupData': {},
                        'Priority': 0,
                        'ProcessedInputs': [],
                        'RejectedInputs': [],
                        'RequestName': None,
                        'SiteBlacklist': [],
                        'SiteWhitelist': [],
                        'StartPolicy': {},
                        'Status': None,
                        'SubscriptionId': None,
                        'Task': None,
                        'TaskName': None,
                        'TeamName': None,
-                       'TimestampFoundNewData': 1738169434,
?                                                        ^^

+                       'TimestampFoundNewData': 1738169489,
?                                                        ^^

                        'WMBSUrl': None,
                        'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement': {'ACDC': {},
                                                                                           'ChildQueueUrl': None,
                                                                                           'CreationTime': 1737716401,
                                                                                           'Dbs': 'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader',
                                                                                           'EndPolicy': {'policyName': 'SingleShot'},
                                                                                           'EventsWritten': 0,
                                                                                           'FilesProcessed': 0,
                                                                                           'Inputs': {'/HLTPhysicsIsolatedBunch/Run2016H-v1/RAW#92049f6e-92f9-11e6-b150-001e67abf228': ['T2_CH_CERN_P5',
                                                                                                                                                                                        'T2_CH_CERN_HLT',
                                                                                                                                                                                        'T2_CH_CERN']},
                                                                                           'Jobs': 8,
                                                                                           'Mask': None,
                                                                                           'NoInputUpdate': False,
                                                                                           'NoPileupUpdate': False,
                                                                                           'NumOfFilesAdded': 0,
                                                                                           'NumberOfEvents': 3008,
                                                                                           'NumberOfFiles': 3,
                                                                                           'NumberOfLumis': 3,
                                                                                           'OpenForNewData': False,
                                                                                           'ParentData': {},
                                                                                           'ParentFlag': False,
                                                                                           'ParentQueueId': '3c36d2561fce25529b81e2045327b944',
                                                                                           'ParentQueueUrl': None,
                                                                                           'PercentComplete': 0,
                                                                                           'PercentSuccess': 0,
                                                                                           'PileupData': {},
                                                                                           'Priority': 190000,
                                                                                           'ProcessedInputs': [],
                                                                                           'RejectedInputs': [],
                                                                                           'RequestName': 'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803',
-                                                                                          'SiteBlacklist': ['T2_AT_Vienna',
?                                                                                                                ^^ ^^^^^^

+                                                                                          'SiteBlacklist': ['T2_FR_IPHC',
?                                                                                                                ^^ ^^^^

+                                                                                                            'T2_GR_Ioannina',
+                                                                                                            'T2_HU_Budapest',
+                                                                                                            'T2_IN_TIFR',
+                                                                                                            'T2_IT_Bari',
+                                                                                                            'T2_IT_Legnaro',
+                                                                                                            'T2_IT_Pisa',
+                                                                                                            'T2_IT_Rome',
-                                                                                                            'T2_BE_IIHE',
?                                                                                                                ^^   --

+                                                                                                            'T2_KR_KISTI'],
?                                                                                                                ^^ + ++  +

-                                                                                                            'T0_CH_CSCS_HPC'],
-                                                                                          'SiteWhitelist': ['T1_US_FNAL',
?                                                                                                                ^^ ^^^^

+                                                                                          'SiteWhitelist': ['T1_DE_KIT'],
?                                                                                                                ^^ ^^^ +

-                                                                                                            'T2_CH_CERN'],
                                                                                           'StartPolicy': 'Block',
                                                                                           'Status': 'Running',
                                                                                           'SubscriptionId': 844,
                                                                                           'TaskName': 'DataProcessing',
                                                                                           'TeamName': 'testbed-vocms0192',
                                                                                           'TimestampFoundNewData': 1737716401,
                                                                                           'WMBSUrl': None},
                        'WMSpec': None,
                        '_id': '3c36d2561fce25529b81e2045327b944',
-                       '_rev': '3-1d6b7ce2885cce5cd0a56c06d8deb254',
+                       '_rev': '4-d8ec4afb9c3197d50a4de4bd5bdb2096',
                        'thunker_encoded_json': True,
                        'timestamp': 1737716402.0188694,
                        'type': 'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement',
                        'updatetime': 1737716402.0188694}]}

dmwm-bot · 2025-01-29T17:48:17Z

Jenkins results:

Python3 Unit tests: succeeded
Python3 Pylint check: failed
- 22 warnings and errors that must be fixed
- 2 warnings
- 25 comments to review
Pycodestyle check: succeeded
- 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/314/artifact/artifacts/PullRequestReport.html

vkuznet

Todor, even though proposed changes looks good to me, I struggle to understand where updates should go. Originally I followed Alan's guidelines about pklFile and now you're changing them to load from local WQ couch database. I rather prefer Alan to provide proper guidelines how load/upload mechanism should work.

amaltaro

Please find a few concerns and requests along the code.

amaltaro · 2025-01-29T23:30:35Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

@@ -287,7 +287,8 @@ def updateElementsByWorkflow(self, workload, updateParams, status=None):
            # Update all workload parameters based on the full reqArgs dictionary
            workload.updateWorkloadArgs(updateParams)
            # Commit the changes of the current workload object to the database:
-            workload.saveCouchUrl(workload.specUrl())
+            metadata = {'name': wfName}
+            workload.saveCouch(self.hostWithAuth, self.db.name, metadata=metadata)


Isn't this method used by global workqueue as well? It might be that making this change for the agent will actually break global workqueue.
In addition, I think it is the first time I see this update being done with metadata (and isn't it redundant with the workload.updateWorkloadArgs?)

Yes (is used by global) and No (won't break anything) and No (it is not redundant)

amaltaro · 2025-01-29T23:39:49Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

@@ -325,6 +326,41 @@ def deleteWQElementsByWorkflow(self, workflowNames):
                deleted += len(ids)
        return deleted

+    def getWQElementsByWorkflow(self, workflowNames, inboxFlag=False):


Other than the workflow name, can't we have a method that will be very very similar to what is implemented for getWorkflowNames():

WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py

Line 294 in 1e9398e

def getWorkflowNames(self, inboxFlag=False):

?

This implementation seems to be overloaded, also with the additional data structures created and grouped by database.

I have changed the the data structure returned so it is now just a list of WQE.
And, No we cannot do Just what is done in getWorkflowNames - because it'd return only the WE ids, while we need the full content of the WE - hence the rest of the function, beyond the call to couchDB vew elementsByWorkflow

I don't understand why you need the full element content.
Can you execute an in-place update as performed by this method:

WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py

Line 198 in 180e27d

def updateElements(self, *elementIds, **updatedParams):

(updateElements())?

additionally, if we have unit tests for this module, please create a new one for this new method.

We are not using this method for updating any WQE, but rather for listing its content like this: [1]

In [1]: sitelistpoller.localWQ.getWQElementsByWorkflow('dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803') Out[1]: [{'Inputs': {}, 'ProcessedInputs': [], 'RejectedInputs': [], 'PileupData': {}, 'ParentData': {}, 'ParentFlag': False, 'Jobs': 0, 'WMSpec': None, 'SiteWhitelist': [], 'SiteBlacklist': [], 'Dbs': None, 'Task': None, 'ParentQueueId': None, 'Priority': 0, 'SubscriptionId': None, 'Status': None, 'EventsWritten': 0, 'FilesProcessed': 0, 'PercentComplete': 0, 'PercentSuccess': 0, 'RequestName': None, 'TaskName': None, 'TeamName': None, 'StartPolicy': {}, 'EndPolicy': {}, 'ACDC': {}, 'ChildQueueUrl': None, 'ParentQueueUrl': None, 'WMBSUrl': None, 'NumberOfLumis': 0, 'NumberOfEvents': 0, 'NumberOfFiles': 0, 'NumOfFilesAdded': 0, 'Mask': None, 'OpenForNewData': False, 'TimestampFoundNewData': 1738241180, 'NoInputUpdate': False, 'NoPileupUpdate': False, 'CreationTime': 1738241180, '_id': '3c36d2561fce25529b81e2045327b944', '_rev': '7-90374a9e21a25a0843c0000ce6b29642', 'thunker_encoded_json': True, 'type': 'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement', 'WMCore.WorkQueue.DataStructs.WorkQueueElement.WorkQueueElement': {'Inputs': {'/HLTPhysicsIsolatedBunch/Run2016H-v1/RAW#92049f6e-92f9-11e6-b150-001e67abf228': ['T2_CH_CERN_P5', 'T2_CH_CERN_HLT', 'T2_CH_CERN']}, 'ParentFlag': False, 'ParentData': {}, 'NumberOfLumis': 3, 'NumberOfFiles': 3, 'NumberOfEvents': 3008, 'Jobs': 8, 'OpenForNewData': False, 'NoInputUpdate': False, 'NoPileupUpdate': False, 'Status': 'Running', 'RequestName': 'dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803', 'TaskName': 'DataProcessing', 'Dbs': 'https://cmsweb-testbed.cern.ch/dbs/int/global/DBSReader', 'SiteWhitelist': ['T1_DE_KIT'], 'SiteBlacklist': ['T2_FR_IPHC', 'T2_GR_Ioannina', 'T2_HU_Budapest', 'T2_IN_TIFR', 'T2_IT_Bari', 'T2_IT_Legnaro', 'T2_IT_Pisa', 'T2_IT_Rome', 'T2_KR_KISTI'], 'StartPolicy': 'Block', 'EndPolicy': {'policyName': 'SingleShot'}, 'Priority': 190003, 'PileupData': {}, 'ProcessedInputs': [], 'RejectedInputs': [], 'ParentQueueId': '3c36d2561fce25529b81e2045327b944', 'SubscriptionId': 844, 'EventsWritten': 0, 'FilesProcessed': 0, 'PercentComplete': 0, 'PercentSuccess': 0, 'TeamName': 'testbed-vocms0192', 'ACDC': {}, 'ChildQueueUrl': None, 'ParentQueueUrl': None, 'WMBSUrl': None, 'NumOfFilesAdded': 0, 'Mask': None, 'TimestampFoundNewData': 1737716401, 'CreationTime': 1737716401}, 'updatetime': 1737716402.0188694, 'timestamp': 1737716402.0188694}]

amaltaro · 2025-01-29T23:41:27Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

+            ids.append(entry["id"])
+
+        for id in ids:
+            wqeList[dbName].append(couchdb.get(uri=f"/workqueue/_design/WorkQueue/_rewrite/element/{id}" ))


If I understand this line correctly, here you are actually fetching the workqueue elements given a doc id. Is that correct? If so, why are we doing this? Can't we call the updateElements() method when we are actually updating the elements properties (of course, outside this method)?

Yes, you did understand properly the line. And again, we are not using this method for updates this is a get* method... neither set* nor update*

amaltaro · 2025-01-29T23:44:48Z

src/python/WMComponent/WorkflowUpdater/SiteListPoller.py

            wHelper = WMWorkloadHelper()
-            wHelper.load(pklFileName)
+            wHelper.load(specUrl)


Now we seem to have a mix of workload object from different sources:

pklFileName, which I believe to live in the filesystem (probably under WorkQueueManager/cache)

specUrl from the CouchDB document.

We should pick one of those and use it consistently. This will avoid confusion and we can also make sure that we are comparing the current site lists agains the correct workload spec file, hence avoid unnecessary updates.

Well, to me, it was not understood why should we switch to loading a .pkl file at the first place, since the initial idea and also, if I am not wrong, Valentin's previous implementation of this was by loading the spec from couch. only after the PR for swapping the functions to use mine updatWorkloadArgs, it was changed to loading the spec from pkl without any explanation why. And as of consistency - the same mechanism happens at central services and we do it quite safely from the database. I do not see a reason why to change this approach here at the Agent.

Something more - tha actual reason indeed is because we are not agnostic to the the way how the workload object is created.

If we load from couchDB: then the method workload.specUrl() returns:

'http://127.0.0.1:5984/workqueue/dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803/spec

if we load from .pkl: then the method workload.specUrl() returns:

/data/srv/wmagent/2.3.8/install/WorkQueueManager/cache/dmapelli_ReReco_RunBlockWhite_Nvidia_test_v1_250124_095017_1803/WMSandbox/WMWorkload.pkl

BTW we may need to investigate on that , because on the other hand I could not reproduce the behaviour just a minute ago, which was strange indeed. but in any case we'd better stay consistent acros Central services and WMAgent and stick to equivalent methods for obtaining the wrkflow spec, which would be guarantied to always produce the spec URI related to the couchDB rather then the filesystem.

amaltaro · 2025-01-29T23:46:06Z

src/python/WMComponent/WorkflowUpdater/SiteListPoller.py

                try:
                    # update local WorkQueue first
                    params = {'SiteWhitelist': siteWhiteList, 'SiteBlacklist': siteBlackList}
-                    self.localWQ.updateElementsByWorkflow(wHelper, params, status=['Available'])
+                    self.localWQ.updateElementsByWorkflow(wHelper, params, status=['Available', 'Negotiating', 'Acquired', 'Running'])


If the element is acquired or running, there is no reason to update the site lists, as data has been already materialized in the WMBS tables and it will no longer change the behavior of jobs.

The Negotiating state might be better to keep though, just in case global workqueue did not manage to update those (I think we do not update Negotiating elements in global).

Which just brings up the question should we not update WMBS as well? This is something I was expecting for a while now to com up. Because looking at it from another angle: There is no way for a WQE to stay in status Availiable at the agent for very long (it would be fairly short period of time if any at all). Then if we change nothing in terms of future jobs to be materialized in condor by updating the WQEs in status Aquired and Running (because being blocked by the already created WMBS records is exactly what it means), then we simply do next to nothing here by changing any of the already fetched WQEs, unless we continue down to condor and propagate the sitelists changes as far as the idle jobs at the very least. Having the short window of updating only WQE in status Availiable (meaning between the time of fetching a WQE from the Global WQ and creating all the records in WMBS) would completely destroy the efficiency of this functionality.

So if we need to dig deeper and propagate the site lists change to the lower layers of the system, I am all up for this. Count me in. The next dev issue is just behind the corner...... Just say a word ... ;)

Yes, this feature is not going to be useful for short workflows in general, as they usually get fully acquired in a very short time window.
When we discussed this featured, we had 2 main alternatives, and they required different level of effort. We decided to take the first one, which is not very intrusive (at job level), and see how it is used and build up experience with it. If there is a need, we can expand this feature to also cover the second alternative.

These options are described in the meta issue: #8323

So for now, let us propagate the site list changes only to WQE that have not yet been materialized in WMBS/jobs - as we have been working on this feature since the beginning.

Even in this case I'd remove only Running from this list of statuses. and keep Negotiating and Acquired

I think we need to zoom out a bit.
Can you please remind me what are the WQE statuses that we update in global workqueue?

According to this documentation: https://cms-wmcore.docs.cern.ch/training/data_flow/
in local queue, we have:

workqueue_inbox reflecting documents being replicated from global (hence they go through a Negotiating, Acquired, Running, etc statuses

workqueue gets created when an agent acquires work from GQ to LQ.

In LQ, are we updating these elements in workqueue or workqueue_inbox database?

Can you please remind me what are the WQE statuses that we update in global workqueue?

if you ask about the WQE statuses in GW concerning this feature, there it is:

WMCore/src/python/WMCore/ReqMgr/Service/Request.py

Line 435 in d48389c

self.gq_service.updateElementsByWorkflow(workload, reqArgs, status=['Available', 'Negotiating', 'Acquired'])

In LQ, are we updating these elements in workqueue or workqueue_inbox database?

In LQ we are updating the elements in workqueue :

It happens here:

WMCore/src/python/WMComponent/WorkflowUpdater/SiteListPoller.py

Line 131 in d48389c

self.localWQ.updateElementsByWorkflow(wHelper, params, status=['Available'])

And the self.localWQ object is created here:

WMCore/src/python/WMComponent/WorkflowUpdater/SiteListPoller.py

Lines 44 to 46 in d48389c

self.localCouchUrl = config.WorkQueueManager.couchurl

self.localWQ = WorkQueue(self.localCouchUrl,

config.WorkQueueManager.dbname)

While the actual database name comes from the agent configuration and it is as I said above workqueue:

(WMAgent.venv3) cmst1@vocms0192:WMCore $ grep WorkQueueManager.dbname $WMA_CONFIG_FILE config.WorkQueueManager.dbname = 'workqueue'

…ten wowrflow status mask

dmwm-bot · 2025-01-30T10:36:27Z

Jenkins results:

Python3 Unit tests: succeeded
- 3 changes in unstable tests
Python3 Pylint check: failed
- 22 warnings and errors that must be fixed
- 2 warnings
- 25 comments to review
Pycodestyle check: succeeded
- 6 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/315/artifact/artifacts/PullRequestReport.html

dmwm-bot · 2025-01-30T12:52:00Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 changes in unstable tests
Python3 Pylint check: failed
- 22 warnings and errors that must be fixed
- 2 warnings
- 25 comments to review
Pycodestyle check: succeeded
- 5 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/316/artifact/artifacts/PullRequestReport.html

todor-ivanov requested review from vkuznet and amaltaro January 29, 2025 18:13

vkuznet approved these changes Jan 29, 2025

View reviewed changes

amaltaro requested changes Jan 29, 2025

View reviewed changes

todor-ivanov added 2 commits January 30, 2025 11:21

Fix Sitelists changes propagation to local workqueue

93150fe

Enlarge WQE status mask && Add getWQElementsByWorkflow method && Shor…

8d86005

…ten wowrflow status mask

todor-ivanov force-pushed the bugfix_WMAgent_LocalWorkQueueSitelistUpdatesNotPropagated_fix-12244 branch from 1e9398e to 8d86005 Compare January 30, 2025 10:22

Reduce WQE structure returned by getWQElementsByWorkflow

180e27d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Sitelists changes propagation to local workqueue #12245

Fix Sitelists changes propagation to local workqueue #12245

todor-ivanov commented Jan 29, 2025 •

edited

Loading

dmwm-bot commented Jan 29, 2025

todor-ivanov commented Jan 29, 2025

dmwm-bot commented Jan 29, 2025

vkuznet left a comment

amaltaro left a comment

amaltaro Jan 29, 2025

todor-ivanov Jan 30, 2025

amaltaro Jan 29, 2025

todor-ivanov Jan 30, 2025

amaltaro Jan 30, 2025

amaltaro Jan 30, 2025

todor-ivanov Jan 30, 2025

amaltaro Jan 29, 2025

todor-ivanov Jan 30, 2025

amaltaro Jan 29, 2025

todor-ivanov Jan 30, 2025

todor-ivanov Jan 30, 2025 •

edited

Loading

amaltaro Jan 29, 2025

todor-ivanov Jan 30, 2025 •

edited

Loading

amaltaro Jan 30, 2025

todor-ivanov Jan 30, 2025

amaltaro Jan 30, 2025

todor-ivanov Jan 30, 2025

todor-ivanov Jan 30, 2025 •

edited

Loading

dmwm-bot commented Jan 30, 2025

dmwm-bot commented Jan 30, 2025

	self.localCouchUrl = config.WorkQueueManager.couchurl
	self.localWQ = WorkQueue(self.localCouchUrl,
	config.WorkQueueManager.dbname)

Fix Sitelists changes propagation to local workqueue #12245

Are you sure you want to change the base?

Fix Sitelists changes propagation to local workqueue #12245

Conversation

todor-ivanov commented Jan 29, 2025 • edited Loading

Status

Description

Is it backward compatible (if not, which system it affects?)

Related PRs

External dependencies / deployment changes

dmwm-bot commented Jan 29, 2025

todor-ivanov commented Jan 29, 2025

dmwm-bot commented Jan 29, 2025

vkuznet left a comment

Choose a reason for hiding this comment

amaltaro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

todor-ivanov Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

todor-ivanov Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

todor-ivanov Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

dmwm-bot commented Jan 30, 2025

dmwm-bot commented Jan 30, 2025

todor-ivanov commented Jan 29, 2025 •

edited

Loading

todor-ivanov Jan 30, 2025 •

edited

Loading

todor-ivanov Jan 30, 2025 •

edited

Loading

todor-ivanov Jan 30, 2025 •

edited

Loading