Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posthog: Add more properties to unable to decrypt errors to allow more filtering options #2332

Closed
6 tasks done
BillCarsonFr opened this issue Mar 11, 2024 · 17 comments
Closed
6 tasks done
Labels
A-Telemetry Telemetry / analytics to understand usage T-Epic Issue is at Epic level Team: Crypto

Comments

@BillCarsonFr
Copy link
Member

BillCarsonFr commented Mar 11, 2024

Currently the captured Error events for unable to decrypt errors do not contain much information.
This lack of information is preventing us from getting interesting insights from the raw data.

Goal: We want to add some new properties that will allow us to better filter out Posthog insights.

List of needed properties:

  • eventLocalAgeAtDecryptionFailure eventLocalAgeMillis (numeric, time in millis). An heuristic based on event origin_server_ts and the current device creation time. This would be used to get the source of the event scrollback/live/initialSync. This would be used in Posthog by creating filter to group events in buckets (e.g <0 (historical), <60s, <5min, <1h, >1h)
  • userTrustsOwnIdentity (boolean). Would allow to know if the current session was trusted at the time of decryption
  • isFederated (boolean). true if userDomain != senderDomain. Would allow to filter UTDs where federation is involved.
  • timeToDecryptMillis (numeric, time in millis) : Currently there is a grace period of 4s, after that UTDs are reported. We would like to improve the existing mecanism by adding a timeToDecrypt properties when an event is decrypted late. We should track decryption errors for 1mn (TBD), if events are decrypted during this period we still report an UTD with the time it took to decrypt, after 1mn we report anyhow. Would allow to filter on posthog temporary vs definitive UTD.
  • isMatrixDotOrg (boolean). True if the current user is using Matrix.org. Combined with isFederated would allow to create a filter where we are sure that both users are running a recent synapse.
  • wasVisibleToUser (boolean). True if that unable to decrypt error was visible to the user. For example, some clients can hide part of the history and only show it after verification. In that case the UTDs won't be displayed on screen and we want to filter that.
@BillCarsonFr BillCarsonFr added A-Telemetry Telemetry / analytics to understand usage Team: Crypto labels Mar 11, 2024
@BillCarsonFr
Copy link
Member Author

These will also be improvements/addition to #2300

@richvdh
Copy link
Member

richvdh commented Mar 11, 2024

eventLocalAgeAtDecryptionFailure

Is it better to log the age relative to the session, or to log the session-creation-time and the timestamp on the event as separate fields (and then do the calculation on the posthog side)?

@BillCarsonFr
Copy link
Member Author

eventLocalAgeAtDecryptionFailure

Is it better to log the age relative to the session, or to log the session-creation-time and the timestamp on the event as separate fields (and then do the calculation on the posthog side)?

AFAIK There is no easy way to do that on posthog side (not in the query to display graphs), maybe a posthog plugin where you can pre-process and decorate events.

@richvdh
Copy link
Member

richvdh commented Mar 12, 2024

isFedarated

isFederated

@richvdh
Copy link
Member

richvdh commented Mar 12, 2024

  • eventLocalAgeAtDecryptionFailure (numeric, time in seconds)
  • timeToDecrypt (numeric, time in seconds)

It looks like other "time" fields in the analytics events measure time in milliseconds. Please let's try to be consistent.

@richvdh
Copy link
Member

richvdh commented Mar 12, 2024

  • timeToDecrypt (numeric, time in seconds) : Currently there is a grace period of 4s, after that UTDs are reported. We would like to improve the existing mecanism by adding a timeToDecrypt properties when an event is decrypted late. We should track decryption errors for 1mn (TBD), if events are decrypted during this period we still report an UTD with the time it took to decrypt, after 1mn we report anyhow. Would allow to filter on posthog temporary vs definitive UTD.

Trying to understand this a bit better. Could you explain what will happen under this plan for each of:

  • An event decrypted in more than 0s but less than 4s (it is not reported to posthog at all?)
  • An event decrypted in more than 4s but less than 60s (it is reported to posthog, but with an appropriate timeToDecrypt ? How does this relate to Posthog | Increase decryption failure grace period #2303?)
  • An event which is still not decrypted after 60s (we give up waiting, and report it to posthog with timeToDecrypt set to ... +inf ?)

@BillCarsonFr
Copy link
Member Author

BillCarsonFr commented Mar 12, 2024

  • timeToDecrypt (numeric, time in seconds) : Currently there is a grace period of 4s, after that UTDs are reported. We would like to improve the existing mecanism by adding a timeToDecrypt properties when an event is decrypted late. We should track decryption errors for 1mn (TBD), if events are decrypted during this period we still report an UTD with the time it took to decrypt, after 1mn we report anyhow. Would allow to filter on posthog temporary vs definitive UTD.

Trying to understand this a bit better. Could you explain what will happen under this plan for each of:

* An event decrypted in more than 0s but less than 4s (it is not reported to posthog at all?)

* An event decrypted in more than 4s but less than 60s (it is reported to posthog, but with an appropriate `timeToDecrypt` ? How does this relate to [Posthog | Increase decryption failure grace period #2303](https://github.com/element-hq/element-meta/issues/2303)?)

* An event which is still not decrypted after 60s (we give up waiting, and report it to posthog with `timeToDecrypt` set to ... `+inf` ?)

Yes so far I think:

  • Decrypted under 4s not reported (mostly to be a bit backward compatible)
  • Decrypted before Max Wait time (60s) Report with timeToDecrypt it took
  • If after Max wait time it's still not decrypted we report with a timeTodecrypt > MAX (we could put some +inf value also)

The max wait period has to be defined, the main risk is that this wait time is too long and the client is closed meanwhile and the error not reported at all.

So maybe it should be 30s and not 60s

@BillCarsonFr
Copy link
Member Author

  • eventLocalAgeAtDecryptionFailure (numeric, time in seconds)
  • timeToDecrypt (numeric, time in seconds)

It looks like other "time" fields in the analytics events measure time in milliseconds. Please let's try to be consistent.

Yes, updated

@richvdh
Copy link
Member

richvdh commented Mar 12, 2024

Yes so far I think:

  • Decrypted under 4s not reported (mostly to be a bit backward compatible)
  • Decrypted before Max Wait time (60s) Report with timeToDecrypt it took
  • If after Max wait time it's still not decrypted we report with a timeTodecrypt > MAX (we could put some +inf value also)

The max wait period has to be defined, the main risk is that this wait time is too long and the client is closed meanwhile and the error not reported at all.

So maybe it should be 30s and not 60s

So do we do this instead of #2303?

@BillCarsonFr
Copy link
Member Author

So do we do this instead of #2303?

I think so, let's confirm at the daily

@BillCarsonFr
Copy link
Member Author

Closed #2303 and do this task instead

@BillCarsonFr
Copy link
Member Author

For UTD that are not resolved after the max_time, we might want to report timeToDecypt=-1
That would allow to filter more easilly

@andybalaam
Copy link
Member

@BillCarsonFr will break into separate tasks.

@BillCarsonFr BillCarsonFr added the T-Epic Issue is at Epic level label Mar 26, 2024
@richvdh richvdh changed the title Posthog: Add more properties to unable to decrypt errors to allow more filtering option Posthog: Add more properties to unable to decrypt errors to allow more filtering options May 2, 2024
@uhoreg
Copy link
Member

uhoreg commented May 24, 2024

@BillCarsonFr All the tasks in this issue are marked as done. Can this issue be closed, or is there more work to do here?

@BillCarsonFr
Copy link
Member Author

yes let's close there is a roundoff for EX

@richvdh
Copy link
Member

richvdh commented Nov 21, 2024

  • eventLocalAgeAtDecryptionFailure (numeric, time in millis).

It looks like that was eventually implemented as eventLocalAgeMillis in the end. Have updated the description accordingly.

@richvdh
Copy link
Member

richvdh commented Nov 21, 2024

  • eventLocalAgeAtDecryptionFailure (numeric, time in millis).

It looks like that was eventually implemented as eventLocalAgeMillis in the end. Have updated the description accordingly.

Also, it has not been implemented for EW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Telemetry Telemetry / analytics to understand usage T-Epic Issue is at Epic level Team: Crypto
Projects
None yet
Development

No branches or pull requests

4 participants