-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore output-specific small number suppression #759
Comments
I like this idea! |
In many cases, the redaction can happen directly after the generation of the measures files. My normal approach to this uses this function to first suppress low numbers and subsequently further redact values if the total number of suppressed values is <=5. This is to reduce any secondary disclosure issues, whilst maintaining as many true values as possible. This doesn't provide protection against small number differences between months as I have generally used it to measure events occurring within each month. If you were measuring "ever had a vaccine" each month, you would need to protect against differencing the cumulative values. As the above can get quite tricky when you start to think about secondary disclosure issues, the prevailing way to redact measures has been to first redact numbers <=5 and then round to the nearest 5 (or 7 in vaccine report). This protects from primary disclosure and provides some protection against secondary disclosure (including preventing differencing cumulative counts between months). On the deciles chart redaction, this is one case where the measures file is grouped by a high cardinality var (and hence quite likely to have small counts) which is later to be aggregated, so redacting early can reduce the utility more significantly. It's not that we don't care about low counts here but more it only makes sense to produce a decile chart by practice if there are enough events! So there are a couple of extra checks that would be needed if the plan was to automate it. |
Current small number suppression may be overly stringent in some cases.
For decile charts, for example, we only care that there are at least n practices (or other group-by variable) per decile, rather than each practice having at least n events.
Louis has prototyped some decile charts suppression code here
This could be an opportunity to experiment with a redaction reusable action, that would happen between the generation of measures files and the use of the measures files.
The reusable action could incorporate #559 #560 and #561
The text was updated successfully, but these errors were encountered: