You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This job failed with a transient DB error, so we asked the user to re-run it. It's the bottom job on this page (link):
The user reports
the failed job was completed successfully...however, its status remains failed ... and that leads to the failure of the notebook action with the following error message: "generate_study_population_hospitalisation_4 failed on a previous run and must be re-run". Just in case, I tried the run_all but it failed as well (see here). My understanding is that a job needs to be run with the failed cohort extractor action and notebook action together at the same time. Am I right?
Why does the user believe the job was completely successfully? There's no evidence of that. Perhaps they misinterpreted our instruction that it was safe to re-run?
Or perhaps they did re-run it, and something else has gone wrong?
When we've understood this, consider if there are any small UI tweaks or terminology changes we could make to improve intelligibility
The text was updated successfully, but these errors were encountered:
The completed job was a reference to this job which was one that I believe @evansdfettled the status of following some unspecified internal error that meant that job didn't actually fail but was reported as such.
My hypothesis is that the user misunderstood that they needed to re-run the generate_study_population_hospitalisation_4, presumably because previously when they'd experienced a failure and we fixed it, fixing it resulted in the action that had appeared to fail actually not failing. Whereas now they need to start a new job.
Ah, in fact I now realise we actually advised them incorrectly:
Following this, one step of the job failed with a database connection error after some time. Several other jobs from other jobs failed at the same time from other projects and both TPP and the Opensafely tech team are investigating the root cause and implementing mitigations.
The failed job was reset by us to run again and completed successfully
This job failed with a transient DB error, so we asked the user to re-run it. It's the bottom job on this page (link):
The user reports
When we've understood this, consider if there are any small UI tweaks or terminology changes we could make to improve intelligibility
The text was updated successfully, but these errors were encountered: