Skip to content
This repository has been archived by the owner on Jul 24, 2021. It is now read-only.

Report Revision Retention Refactor #619

Closed
perigrin opened this issue Jan 23, 2019 · 9 comments · Fixed by #653
Closed

Report Revision Retention Refactor #619

perigrin opened this issue Jan 23, 2019 · 9 comments · Fixed by #653
Assignees
Labels
api device reports Involves data coming from reporters enhancement extends current functionality

Comments

@perigrin
Copy link
Contributor

perigrin commented Jan 23, 2019

Currently the API saves all reports that are submitted, this is far more often than we need. Going forward we only need to record reports in the following circumstances:

  1. The most recent report
  2. First report in a current phase
  3. Last report in a current phase
  4. Reports that FAIL
  5. First PASS report after a failure.

Since currently Conch has no concept of Phases we only need to save the first report, most recent report, failed reports, and the first report after a failure.

Additionally we'll need to write a script that purges the data base of all reports not matching the above criteria.

This obsoletes #551 & #555.

This is different from #460 in that we still want to run validations, we just don't need to store the result if the validation state doesn't fall into one of the other categories we care about retaining.

@perigrin perigrin added enhancement extends current functionality api device reports Involves data coming from reporters labels Jan 23, 2019
@sungo
Copy link
Contributor

sungo commented Jan 23, 2019

i disagree with the final paragraph because of circumstance 1. When a new device report comes in, we always run validations and we always save the results because it is "the most recent report". The question is what we do with the report that came in prior.

That last-but-one report now must be evaluated to see if it falls into a bucket we want to save. This will require both a way to potentially delete a report and its attendant validation results but also a new hook inside the device report processing code that allows us to look back one report and make decisions.

Another possible path is to keep the current endpoint exactly the way it is and process the history separately. The script you mention could just be the code we just always run. I'd like to talk about it maybe not being a sidecar script but that's separate topic.

@karenetheridge
Copy link
Contributor

I updated the description for points 2 and 3 to show that we do want to keep the first and last reports (given that we can assume just one phase for now).

@karenetheridge karenetheridge self-assigned this Jan 23, 2019
@sungo
Copy link
Contributor

sungo commented Jan 23, 2019

I think we can track the clean up process over in existing GHI #461

@karenetheridge
Copy link
Contributor

What about the existing duplication checks? At present, if the report exactly matches (minus some always-changing fields like report_id) the previous report for the device, we immediately return http 200 with the previous validation_state, without processing the report. Do we keep doing this check, or always process the report now?

@sungo
Copy link
Contributor

sungo commented Jan 23, 2019

In my opinion, we no longer need that check. We will always process the incoming report as "the most recent".

@perigrin
Copy link
Contributor Author

report_id never changes because as I discovered yesterday it was removed from the reports back in October.

+1 for removing that check though

@karenetheridge
Copy link
Contributor

karenetheridge commented Feb 1, 2019

What about the last pass before a failure?

Check out this truth table, where we fetch the previous and previous-previous
reports for a device (in the same phase as the current report):

prevprev    previous   current     delete previous?
dne         dne        FAIL        0   (previous report does not exist)
dne         dne        PASS        0   (previous report does not exist)
dne         FAIL       FAIL        0   (keep first)
dne         FAIL       PASS        0   (keep first)
dne         PASS       FAIL        0   (keep first)
dne         PASS       PASS        0   (keep first)
FAIL        FAIL       FAIL        0   (keep reports that fail)
FAIL        FAIL       PASS        0   (keep reports that fail)
FAIL        PASS       FAIL        0   (keep first pass after a failure)
FAIL        PASS       PASS        0   (keep first pass after a failure)
PASS        FAIL       FAIL        0   (keep reports that fail)
PASS        FAIL       PASS        0   (keep reports that fail)
PASS        PASS       FAIL        ??  last pass before a failure?
PASS        PASS       PASS        1

If we also retain the last pass before a failure, then the decision is really
easy -- we only keepdelete a report if it is surrounded by a pass on both sides (not
a failure or a does-not-exist).

@karenetheridge
Copy link
Contributor

karenetheridge commented Feb 1, 2019

implementation complication, perhaps of interest to no one but myself: The truth table I provided above cannot be applied in realtime, because the previous-previous report may have already been deleted when the previous report was received. Consequently we are no longer sure if it ever existed. We need to keep at least 2 reports in a row and consider if the n-2th report is eligible for deletion, by looking at the last three reports' history. <sour face>

or, we add a column to device_report where we cache some retention information. say, a nullable boolean "retain?", where if it is true, we know we can't delete it and don't have to bother looking back in history. otherwise, if it is null, it might be eligible for deletion and then we apply our truth table logic.

@perigrin
Copy link
Contributor Author

perigrin commented Feb 7, 2019

or, we add a column to device_report where we cache some retention information. say, a nullable boolean "retain?", where if it is true, we know we can't delete it and don't have to bother looking back in history. otherwise, if it is null, it might be eligible for deletion and then we apply our truth table logic.

I'm actually more than okay with this as a solution. It means moving forward if there are more reports that need to be kept (for whatever one-off reasons) we can simply flag them as retained.

karenetheridge added a commit that referenced this issue Feb 12, 2019
…e older passing reports

We do not keep a device report if it follows a passing result and also is followed by a passing
result. This means that we consider *previous* reports for deletion, never the most recent one.

- remove duplicate device report detection logic (see #460, #492): we now once again process
  all device reports as they come in.
- reorder validation_status_enum for nicer queries
- create a command-line script to thin out historical reports.

closes #619, #461.
karenetheridge added a commit that referenced this issue Feb 13, 2019
…e older passing reports

We do not keep a device report if it follows a passing result and also is followed by a passing
result. This means that we consider *previous* reports for deletion, never the most recent one.

- remove duplicate device report detection logic (see #460, #492): we now once again process
  all device reports as they come in.
- reorder validation_status_enum for nicer queries
- create a command-line script to thin out historical reports.

closes #619, #461.
karenetheridge added a commit that referenced this issue Feb 19, 2019
…e older passing reports

We do not keep a device report if it follows a passing result and also is followed by a passing
result. This means that we consider *previous* reports for deletion, never the most recent one.

- remove duplicate device report detection logic (see #460, #492): we now once again process
  all device reports as they come in.
- reorder validation_status_enum for nicer queries
- create a command-line script to thin out historical reports.

closes #619, #461.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api device reports Involves data coming from reporters enhancement extends current functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants