Skip to content

Commit

Permalink
[release-2.8] add the document for manually running the cronjob (#533)
Browse files Browse the repository at this point in the history
* add the document for manually exec the job

Signed-off-by: myan <[email protected]>

* format the doc

Signed-off-by: myan <[email protected]>

* update the review

Signed-off-by: myan <[email protected]>

* add link to the troubleshooting

Signed-off-by: myan <[email protected]>

---------

Signed-off-by: myan <[email protected]>
Co-authored-by: myan <[email protected]>
  • Loading branch information
openshift-cherrypick-robot and yanmxa authored Jul 10, 2023
1 parent c7858a1 commit c35b08f
Show file tree
Hide file tree
Showing 2 changed files with 123 additions and 1 deletion.
120 changes: 119 additions & 1 deletion doc/how_global_hub_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,122 @@ In the long run, the desired state is full compliance. If daily variations as ca
Note:
- multicluster global hub operator controls the life cycle of the multicluster global hub manager and global hub agent
- kafka and database can run on the Global cluster or outside it
- ACM hub also runs on the multicluster global hub cluster, but does not participate in the daily functioning of the global hub.
- ACM hub also runs on the multicluster global hub cluster, but does not participate in the daily functioning of the global hub.

### Running the Summarization Process manually

Before starting, the first thing you need to know is that the process of this summary consists of two subtasks:
- Insert the cluster policy data of that day from [Materialized View](https://www.postgresql.org/docs/current/rules-materializedviews.html) `local_compliance_view_<yyyy_MM_dd>` to `history.local_compliance`.
- Update the `compliance` and policy flip `frequency` of that day to `history.local_compliance` based on `event.local_policies`.

#### Execution steps

1. Connect to the database

You can use clients such as pgAdmin, tablePlush, etc. to connect to the Global Hub database to execute the SQL statements involved in the next few steps. If your postgres database is installed through [this script](../operator/config/samples/storage/deploy_postgres.sh), you can directly connect to the database on the cluster through the following command.
```bash
kubectl exec -it $(kubectl get pods -n hoh-postgres -l postgres-operator.crunchydata.com/role=master -o jsonpath='{.items..metadata.name}') -n hoh-postgres -c database -- psql -d hoh
```
2. Determine the date that needs to be executed, take `2023-07-06` as an example

If you find on the dashboard that there is no any compliance information on `2023-07-06`, then find the the job failure information of the day after this day, that is `2023-07-07`, in `history.local_compliance_job_log`. In this way, it can be determined that `2023-07-06` is the date we need to manually execute the summary processes.

3. Check whether the Materialized View `history.local_compliance_view_2023_07_06` exists
```sql
select * from history.local_compliance_view_2023_07_06;
```
- If the view exists, load the view records to `history.local_compliance`
```sql
CREATE OR REPLACE FUNCTION history.insert_local_compliance_job(
view_date text
)
RETURNS void AS $$
BEGIN
EXECUTE format('
INSERT INTO history.local_compliance (policy_id, cluster_id, leaf_hub_name, compliance, compliance_date)
(
SELECT policy_id, cluster_id, leaf_hub_name, compliance, %2$L
FROM history.local_compliance_view_%1$s
ORDER BY policy_id, cluster_id
)
ON CONFLICT (policy_id, cluster_id, compliance_date) DO NOTHING',
view_date, view_date);
END;
$$ LANGUAGE plpgsql;
-- exec the insert func for that day '2023_07_06'
SELECT history.insert_local_compliance_job('2023_07_06');
```

- If the view not exists, inherit the history compliance records of the day before that day, that is `2023_07_05`
```sql
CREATE OR REPLACE PROCEDURE history.inherit_local_compliance_job(
prev_date TEXT,
curr_date TEXT
)
LANGUAGE plpgsql
AS $$
BEGIN
EXECUTE format('
INSERT INTO history.local_compliance (policy_id, cluster_id, leaf_hub_name, compliance_date, compliance, compliance_changed_frequency)
SELECT
policy_id,
cluster_id,
leaf_hub_name,
%1$L,
compliance,
compliance_changed_frequency
FROM
history.local_compliance
WHERE
compliance_date = %2$L
ON CONFLICT (policy_id, cluster_id, compliance_date) DO NOTHING
', curr_date, prev_date);
END;
$$;
-- call the func to generate the herit the data of '2023_07_05' and generate the data of '2023_07_06'
CALL history.inherit_local_compliance_job('2023_07_05', '2023_07_06');
```

4. Update the `compliance` and `frequency` information of that day to `history.local_compliance`
```sql
CREATE OR REPLACE FUNCTION history.update_local_compliance_job(start_date_param text, end_date_param text)
RETURNS void AS $$
BEGIN
EXECUTE format('
INSERT INTO history.local_compliance (policy_id, cluster_id, leaf_hub_name, compliance_date, compliance, compliance_changed_frequency)
WITH compliance_aggregate AS (
SELECT cluster_id, policy_id, leaf_hub_name,
CASE
WHEN bool_and(compliance = ''compliant'') THEN ''compliant''
ELSE ''non_compliant''
END::local_status.compliance_type AS aggregated_compliance
FROM event.local_policies
WHERE created_at BETWEEN %1$L::date AND %2$L::date
GROUP BY cluster_id, policy_id, leaf_hub_name
)
SELECT policy_id, cluster_id, leaf_hub_name, %1$L, aggregated_compliance,
(SELECT COUNT(*) FROM (
SELECT created_at, compliance,
LAG(compliance) OVER (PARTITION BY cluster_id, policy_id ORDER BY created_at ASC) AS prev_compliance
FROM event.local_policies lp
WHERE (lp.created_at BETWEEN %1$L::date AND %2$L::date)
AND lp.cluster_id = ca.cluster_id AND lp.policy_id = ca.policy_id
ORDER BY created_at ASC
) AS subquery WHERE compliance <> prev_compliance) AS compliance_changed_frequency
FROM compliance_aggregate ca
ORDER BY cluster_id, policy_id
ON CONFLICT (policy_id, cluster_id, compliance_date)
DO UPDATE SET
compliance = EXCLUDED.compliance,
compliance_changed_frequency = EXCLUDED.compliance_changed_frequency',
start_date_param, end_date_param);
END;
$$ LANGUAGE plpgsql;
-- call the func to update records start with '2023-07-06', end with '2023-07-07'
SELECT history.update_local_compliance_job('2023_07_06', '2023_07_07');
```
5. Once the above steps are successfully executed, you can find the records of that day generated in `history.local_compliance`. Then you can delete the Materialized View `history.local_compliance_view_2023_07_06` safely.
```sql
DROP MATERIALIZED VIEW IF EXISTS history.local_compliance_view_2023_07_06;
```
4 changes: 4 additions & 0 deletions doc/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,3 +113,7 @@ Whereas `pg_restore` is used to restore a PostgreSQL database from an archive cr
```shell
pg_restore -h another.host.com -p 5432 -U postgres -d hoh hoh-$(date +%d-%m-%y_%H-%M).tar
```

## Cronjob Compliance Data Restore
### Generate the missed compliance data
The Grafana Datasource is mainly from a table named `history.local_compliance`. Its records are generated by a summarization routine kicks off at 00:00:00 every night. Usually, we don't need to run the summary process manually. However, unexpected errors may occur when running the compliance job, so it is necessary for us to manually log in to the database to perform the whole summary process to recover the data that is not generated. You can follow [Running the Summarization Process manually](./how_global_hub_works.md#running-the-summarization-process-manually) to achieve that.

0 comments on commit c35b08f

Please sign in to comment.