Use this guide to troubleshoot issues and resolve errors that may occur when notarial-api or notarial-worker is deployed.
Connect to the database:
kubectl run -it --rm --env PGPASSWORD='<PASSWORD>' --env PAGER= --image=postgres:16 --restart=Never postgres-client -- psql -h <ENDPOINT_URL> -U master -d notarial
Replace PASSWORD with the password for the database, ENDPOINT_URL with the endpoint URL for the database.
pgboss is used to manage queueing jobs. On application start, pgboss will automatically create necessary tables in the database.
The jobs table pgboss.job
is where all the current jobs are stored. Jobs will remain here, until they are completed or failed. Then they will move to pgboss.archive
The jobs table has the following columns:
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+-----------------------------
id | uuid | | not null | gen_random_uuid()
name | text | | not null |
priority | integer | | not null | 0
data | jsonb | | |
state | pgboss.job_state | | not null | 'created'::pgboss.job_state
retrylimit | integer | | not null | 0
retrycount | integer | | not null | 0
retrydelay | integer | | not null | 0
retrybackoff | boolean | | not null | false
startafter | timestamp with time zone | | not null | now()
startedon | timestamp with time zone | | |
singletonkey | text | | |
singletonon | timestamp without time zone | | |
expirein | interval | | not null | '00:15:00'::interval
createdon | timestamp with time zone | | not null | now()
completedon | timestamp with time zone | | |
keepuntil | timestamp with time zone | | not null | now() + '14 days'::interval
on_complete | boolean | | not null | false
output | jsonb | | |
Columns/values to note are
name
: the name of the job. It can be one of SES_PROCESS, NOTIFY_PROCESS, SES_SEND, NOTIFY_SEND. More detail can be found in worker/src/README.md. PgBoss will also create some of its own.state
: the state of the job. Read more about them in pgboss documentationcreated
: the job has been createdfailed
: the job has failedcompleted
: the job has been completed (successfully)active
: the job is currently being processed
data
: the data associated with the job.output
: the output of the job. This will contain the reference number, or the error message if the job has failedkeepuntil
: the time until the job will be kept in the table. As long as the state is notfailed
startafter
: the time the job will start processing
To find jobs that have failed, run the following query:
select id, output from pgboss.job where state = 'failed';
If the retrylimit has not been hit (retrylimit > retrycount) and the retrylimit is not 0, the job will be automatically retried.
It is recommended you run every query in a transaction, so that you can abort the changes if they are incorrect.
begin;
-- First run a query to print the current state of the job you are trying to change
select data from pgboss.job where id = '<id>';
update pgboss.job
set state = 'retry',
completedon = null,
retrycount = 0,
startafter = now()
where id = '<id>';
-- Run the query again, to see if you've made the correct changes
select data from pgboss.job where state = 'failed' and id = '<id>';
-- Run the following query to commit the changes
-- commit;
-- Run the following to abort the changes
-- rollback;
The following queries will assume that you are running them in a transaction.
If the data is incorrect, you can update the data in the database. All data is stored as jsonb, so you can use postgresql's jsonb functions to update the data
update pgboss.job
set data = jsonb_set(
data,
'{data, keyToChange}',
'"<NEW_ANSWER>"'
)
where id = '<id>';
You may find it easier to copy the data to a text editor, make the changes, and then update the data in the database.
update pgboss.job
set data = '<NEW_DATA>'
where id = '<id>';
If a job has failed, and you want to retry it, you can update the startafter
column to now, and reset the retrycount
to 0.
update pgboss.job
set state = 'retry',
completedon = null,
retrycount = 0,
startafter = now()
-- output = null
where id = '<id>';
You may also want to update output to null, to clear the error message.
If the job does not seem to be retrying, or it is easier to just create a new job you need to create a new job, you can do so by running the following query:
insert into pgboss.job (name, data)
values ('NOTIFY_PROCESS', '{"answers": {}, "metadata": {}, "reference": "123456"}');
Alternatively, you can copy the data from the failed job, and create a new job with the same data.
insert into pgboss.job (name, data)
SELECT name, data
from pgboss.job where id = '<id>';
Ensure you archive or delete the failed job if you decide to create a new one.
If a job has been moved to the archive, and you want to retry it, you can move it back to the jobs table.
insert into pgboss.job (name, data)
SELECT name, data
from pgboss.archive where id = '<id>';
The following errors are logged by the API. They may be inserted into the database described above, but some errors are to do with failure to send to the queue. Below is a summary of the errors that may be logged by the API. Some errors may have steps to remediate.
Error type | Error code | Comment |
---|---|---|
WEBHOOK | VALIDATION | POST from forms-worker (or forms runner) failed validation. |
SES | PROCESS_VALIDATION | POST to /forms/emails/staff failed validation |
SES | MISSING_ANSWER | Expected answer is missing. Check the queue database first, then the notarial database |
SES | UNKNOWN | The error has been identified as relating to SES, but logs must be checked for further details |
NOTIFY | PROCESS_VALIDATION | POST to /forms/emails/user failed validation |
NOTIFY | UNKNOWN | The error has been identified as relating to NOTIFY (or UserService) |
QUEUE | SES_PROCESS_ERROR | Inserting into queue failed |
QUEUE | NOTIFY_PROCESS_ERROR | Inserting into queue failed |
GENERIC | UNKNOWN | |
GENERIC | RATE_LIMIT | Rate limit exceeded. If required, this can be changed by adjusting RATE_LIMIT env var |
GENERIC | UNKNOWN_SERVICE_APPLICATION | An email address could not be found for the specified post. This implies that a user should not be able to apply for a service at this post |
For the API, generally you may fix the issues in a few ways
- Update the data in the database and retry the job
- Fix the code, redeploy, and let the job be retried. You may need to reset the retry limits via the database
This is a logged error only, but is not thrown or cause an HTTP error. This error should be rare, and likely down to a database issue.
The error will appear like so
{"level":50,"time":1713362276177,"pid":34641,"service":"Submit","reference":"DG19_IJVV6","err":{"type":"ApplicationError","message":"unable to queue NOTIFY_PROCESS_ERROR","stack":"QUEUE: unable to queue NOTIFY_PROCESS_ERROR","httpStatusCode":500,"code":"NOTIFY_PROCESS_ERROR","isOperational":true,"exposeToClient":true,"name":"QUEUE"},"errorCode":"SUBMIT_FORM_ERROR","msg":"NOTARIAL_API_ERROR User's data did not queue correctly. Responding with reference number since their data is safe."}
The user's data failed to queue for further processing steps.
in the /queue database, find and set the job to the failed
state, and set the retrylimit and retrycount to 0. This is to prevent the user's data from being archived and deleted.
update pgboss.job
set state = 'failed',
retrylimit = 0,
retrycount = 0
where output->>'reference' = 'DG19_IJVV6';
If this is a code based issue, redeploy the API with the fix, and retry the job by updating the entry, or creating a new job with the same data.
update pgboss.job
set state = 'created',
retrylimit = 50,
retrycount = 0,
completedon = null
where output->>'reference' = 'DG19_IJVV6';
The forms-worker or forms runner attempted to POST to /forms, but validation failed. Only basic validation is applied, so this error should be rare.
Check the /queue database first. If data can be easily amended, amend it on the /queue database and resend it. More details can be found in the forms-queue troubleshooting guide.
Only basic validation is applied, so this error should be rare.
Check the /notarial database
-- get all failed
select * from pgboss.job where name = 'SES_PROCESS' and state = 'failed';
-- get by job id
select * from pgboss.job where id = '<id>';
-- get by reference number (GOV.UK Pay reference number / notify reference number)
select * from pgboss.job where data->>'metadata'->>'reference' = '<reference>';
Expected answer is missing. These are required fields, and embassies/consulates need this information. Work backwards in this case.
- Investigate the payload in the notarial database with the name
SES_PROCESS
first to see if that data is present, but has been incorrectly parsed. - If the data was not present, check the /queue database
If data was present in the queue database, but not in the notarial database, you will need to manually insert the data into the notarial database.
The form JSON may have changed, meaning the remapper is out of date (or the JSON has been updated in error). Data can be manually added or changed, so it's compatible with the remapper, or a fix to the code can be made.
You will need to append a field
object to the jsonb data column in the notarial database.
- Check the logs or entry output for which field is missing. It will appear as
Missing answer for <key>
. - Check the value that the remapper is expecting
-- To add append a field to the fields array
update pgboss.job
set data = jsonb_set(
data,
'{fields}',
data->'fields' || '{
"key": "jurats",
"type": "list",
"answer": "Yes",
"category": "oath"
}'
)
where id = '<id>';
If you need to edit a field, it may be easier to copy the data to a text editor, make the changes, and then update the data in the database.
Only basic validation is applied, so this error should be rare. The job can be edited manually.
There is an issue adding data to the database. Investigate RDS.
The following are thrown errors, and may be inserted in the database. They have been parsed as ApplicationError
s to simplify debugging.
Error type | Error code | Comment |
---|---|---|
FILE | ORIGIN_NOT_ALLOWED | Attempted to fetch a file that is not the document upload API |
FILE | NOT_FOUND | document upload API returned a 404 |
FILE | URL_INVALID | The URL is invalid |
FILE | UNKNOWN | An error was identified as relating to file code, but logs must be checked for further details |
CONSUMER | START_FAILED | Connection to database/queue could not be established |
The following are logged errors and related to sending requests. These errors are caught, logged, and rethrown with minimal parsing to preserve as much data as possible.
Error code | Comment |
---|---|
NOTIFY_PROCESS_REQUEST_ERROR | Error sending request to notarial-api |
NOTIFY_PROCESS_RESPONSE_ERROR | notarial-api responded with an error |
NOTIFY_PROCESS_AGGREGATE_ERROR | Multiple errors occurred whilst making the request to notarial-api |
NOTIFY_PROCESS_UNKNOWN_ERROR | An error was identified as relating to GOV.UK Notify |
NOTIFY_SEND_REQUEST_ERROR | Error sending request to GOV.UK Notify |
NOTIFY_SEND_RESPONSE_ERROR | GOV.UK Notify responded with an error |
NOTIFY_SEND_AGGREGATE_ERROR | Multiple errors occurred whilst making the request to GOV.UK Notify |
NOTIFY_SEND_UNKNOWN_ERROR | An error was identified as relating to GOV.UK Notify |
NOTIFY_FAILURE_CHECK_REQUEST_ERROR | An error occurred while trying to retrieve a list of email send failures from GOV.UK Notify |
SES_PROCESS_REQUEST_ERROR | Error sending request to notarial-api |
SES_PROCESS_RESPONSE_ERROR | notarial-api responded with an error |
SES_PROCESS_AGGREGATE_ERROR | Multiple errors occurred whilst making the request to notarial-api |
SES_PROCESS_UNKNOWN_ERROR | An error was identified as relating to SES |
SES_SEND_SES_EXCEPTION | SES responded with an error |
SES_SEND_ON_COMPLETE | The completion handler failed with an error |