-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add correction for padding multiplier in "Verteilung TRL" #5
Comments
Is the tool by mh- able to see if the padding change really applied for the whole daily package? The upload numbers for today seem to be quite high. If that is a real increase I am happy ;-) |
No, the new multiplier 5 was applied during the day, so for more correct values you would have to use the hourly key packages. 2 or 3 of these still used 10. |
It was changed at 11 AM CEST (9 AM UTC; package 9; vide infra)
Good news! I checked this twice and also uploaded the hourly packages. The number of users seems to be correct: sum of hourly packages: daily package:
hourly package 6:
hourly package 7:
hourly package 9:
hourly package 11:
hourly package 14:
hourly package 15:
hourly package 16:
hourly package 19:
|
I assume the package at 19: has only 3 users. one user 13 keys (1.7-19.6), 1 user 12 keys (1.7-19.6; has no key for 24.6), and 1 user 6 keys (1.7-26.6) or 4 Users if no hole is allowed: |
This also affects package 11: one user with an "Invalid Transmission Risk Profile". So, it might be 2 users less for yesterday. |
I think you are right, but: |
You are absolutely right. My claim was just based on the inspection of the hourly packages. I don't see any way to improve the estimated numbers for yesterday. Hopefully, we do not see these multiplier changes too frequently. |
for package 11 i get 7 users with hole and 8 users without hole. |
Due to a padding multiplier change from 10 to 5 yesterday, the reported numbers of the daily package were incorrect. These values have been manually corrected by an analyse of the hourly packages.
I am not sure if this is the correct place here, but you may have seen the Spiegel interview with Mr. Spahn (here (paywall). He says:
Do you think people would go through the trouble of calling the hotline and then not submit, or is there an issue with the padding factor calculation that leads to a result that is off by a factor of two? |
my guess is that it's in the first day, since noone knows how the packet from "2020-06-23" is actually padded |
@kai-truempler: Thanks for sharing this. I totally agree and I would rather expect people not to call the hotline in case of a positive test (stigma, time, effort, etc.).
@janpf: This might be an issue, however, I want to point out that every day there's a significant number of keys which get not parsed (vide infra). Thus, I would expect that the estimates by
|
Oh absolutely true, I forgot about those "keys not parsed" What might be beneficial: on my dashboard I just changed to an hourly analysis, as suggested above by @mh-. This way I'm currently at a total of 218 users and thereby off by a factor of 1.37 @kai-truempler ;)
Absolutely. |
With parsing all keys you can get a minimum number of infected persons. If i count the minimum users that submit keys i get round about 250. (23.6. - 02.07.) |
And I'm back down to 188 as the parser just got updated: mh-/diagnosis-keys@104388c |
Ok, maybe I could change the strategy, now that "old Android apps" cannot submit Diagnosis Keys anymore. For example, just counting the number of users is very simple now, it would just require counting all keys with TRL 6, because every user will submit exactly one key with that TRL. (And of course divide by the padding multiplier.) The harder part is to count the number of keys per each user, something that I wanted to do in order to find out if keys can be linked together (violating the "non-linkability-across-multiple-day" promise). So what exactly do you want from the parser? |
Great idea counting the "6"s! Update: did change it and now we're back up to ~200. |
I added the option |
Just looking at the example you provided there, you can still at least provide the minimum user count. You can still have the case that it is in fact more users transmitting only random unconnected days, but if you have too many „1“s or „6“s than one user can have, it’s still at least two users. |
Yes, in the example with the 14 keys, there must have been between 2 and 14 users. This is a wide range, though. |
Ok, sure, but in almost all cases it will be the minimum number or very close to it. Which is good enough for the kind of analytics most are looking for. |
Note: If you download the hour/day package you will notice, that they will change their content. |
I have made an excel tab and did an manual examination of the keys. https://github.com/Tho-Mat/corona-stuff/blob/master/%C3%BCberblick.xlsx |
Are there any information on why they would do this?
Just by counting "6"s I get 231 with the "new" packages for 23./24. and 226 with the old ones.
Update: I noticed you're doing a "per-key"-padding analysis, while I'm on a "per-package"-basis. That explains the differences. 👍 |
I think they will reduce traffic, since it makes no sense to check keys, that are older than 14 day. |
@Tho-Mat: Thanks for your comment. At first, I was already a little bit confused last night, because the old hourly packages were changed. My wrong assumption was that the clean-up of the keys older than 14 days is based on a package level and not on the individual key level. |
Just as an update to my previous comment, from Phoenix:
That looks closer to the estimate than the 300 from Mr. Spahn 10 days ago. |
Fortunately, the RKI is publishing these numbers on a weekly basis. Thus, I have added another diagram for the published teleTANs last night. However, it is a single PDF which gets overwritten every week. Looking at the number of issued teleTANs: |
I think this issue can be closed, now that padding multiplier is set to one on the server. @micb25 do you agree? |
The plots in "Verteilung Transmission Risk Level (TRL) in Diagnoseschlüsseln" currently use the number of keys transmitted including the padded fake keys afaiu. As long as the padding factor stays the same this shouldn't be a problem. But this factor will change from tomorrow on (the plan is to bring it down to 1 eventually). The changes in the padding multiplier will cause some distortion in those graphs as new data will receive less weight.
My suggestion would be to use the data which has been corrected for this multiplier like in the "Geteilte Diagnoseschlüssel von positiv getesteten Personen" section.
@mh- has introduced an automatic detection for the multiplier used in the data set in his parsing tool: corona-warn-app/cwa-server#620 (comment)
The text was updated successfully, but these errors were encountered: