Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot update CS after finding dead CS worker #7939

Open
chaen opened this issue Dec 9, 2024 · 0 comments
Open

Cannot update CS after finding dead CS worker #7939

chaen opened this issue Dec 9, 2024 · 0 comments

Comments

@chaen
Copy link
Contributor

chaen commented Dec 9, 2024

When updating the CS from the WebApp, we occasionally get

ERROR: ERROR: AutoMerge failed: Could not AutoMerge. Could not retrieve original committer's version
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 349, in _processInThread
    result = self._processProposal(trid, proposalTuple, handlerObj)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 536, in _processProposal
    result = self._executeAction(trid, proposalTuple, handlerObj)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/private/Service.py", line 556, in _executeAction
    response = handlerObj._rh_executeAction(proposalTuple)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 120, in _rh_executeAction
    retVal = self.__doRPC(actionTuple[1])
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 251, in __doRPC
    return self.__RPCCallFunction(method, args)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/DISET/RequestHandler.py", line 292, in __RPCCallFunction
    uReturnValue = oMethod(*args)
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/ConfigurationSystem/Service/ConfigurationHandler.py", line 71, in export_commitNewData
    return gServiceInterface.updateConfiguration(sData, credDict["username"])
  File "/opt/dirac/versions/v11.0.52-1733134524/Linux-x86_64/lib/python3.11/site-packages/DIRAC/ConfigurationSystem/private/ServiceInterfaceBase.py", line 219, in updateConfiguration
    return S_ERROR(f"AutoMerge failed: {result['Message']}")

This is due to the CS not finding a correct backup in

def __getCfgBackups(self, basePath, date="", subPath=""):

This method finds the latest backup by looking at the zip files containing the date found in the client's DIRAC/Configuration/Version.
This version is distributed by the client by the CS, so there's no real reason it would be wrong.
Except when a slave is found dead.
In that case, a new version is generated:

@400000006756768d1d106ff4.s-57426-2024-12-09 04:46:50 UTC Configuration/Server [140072925378112] WARN: Found dead slave dips://speen.nikhef.nl:9135/Configuration/Server
@400000006756768d1d106ff4.s:57428:2024-12-09 04:46:51 UTC Configuration/Server [140072925378112] INFO: Generated new version 2024-12-09 04:46:51.020183

But this version is never actually committed (and we do not want to). So there's no backup file corresponding to that date.

@fstagni fstagni changed the title Cannot update CS after finding dead CS slave Cannot update CS after finding dead CS worker Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant