Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special case and remove from ACRA: "Syncing failed, because your email address needs to be (re)confirmed." #17392

Open
2 of 4 tasks
david-allison opened this issue Nov 7, 2024 · 14 comments

Comments

@david-allison
Copy link
Member

david-allison commented Nov 7, 2024

https://ankidroid.org/acra/app/1/bug/252627/report/27d23a2f-90f0-4c10-a5fa-2e3ccf1ac093

net.ankiweb.rsdroid.exceptions.BackendSyncException: Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as **email** to proceed.
	at net.ankiweb.rsdroid.exceptions.BackendSyncException.<init>(BackendSyncException.kt:21)
	at net.ankiweb.rsdroid.BackendException$Companion.fromError(BackendException.kt:107)
	at net.ankiweb.rsdroid.BackendKt.unpackResult(Backend.kt:281)
	at net.ankiweb.rsdroid.BackendKt.access$unpackResult(Backend.kt:1)
	at net.ankiweb.rsdroid.Backend.runMethodRaw$lambda$1(Backend.kt:118)
	at net.ankiweb.rsdroid.Backend.withBackend(Backend.kt:131)
	at net.ankiweb.rsdroid.Backend.runMethodRaw(Backend.kt:117)
	at anki.backend.GeneratedBackend.syncStatusRaw(GeneratedBackend.kt:61)
	at anki.backend.GeneratedBackend.syncStatus(GeneratedBackend.kt:66)
	at com.ichi2.anki.DeckPicker$automaticSync$areThereChangesToSync$status$1.invokeSuspend(DeckPicker.kt:1208)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.java:111)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702)
	Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [p0{Cancelling}@a3c347a, Dispatchers.Main.immediate]

Task List

  • AnkiDroid 2.19: Detect & Block the email reconfirmation BackendSyncException from being sent to ACRA
    • I suspect the best method for this is to have an exception subclass
    • I suspect we need to match on the exception message
  • Anki 24.11: Strongly type the exception so we can fix it properly
    • AnkiDroid 2.20: Improve our exception mapping so we're not matching on the exception message
  • ACRA:
    • Clear instances of the BackendSyncException
    • Block further occurrences from hitting the DB

Related:

@david-allison david-allison changed the title https://ankidroid.org/acra/app/1/bug/252627/report/27d23a2f-90f0-4c10-a5fa-2e3ccf1ac093 Special case and remove from ACRA: "Syncing failed, because your email address needs to be (re)confirmed." Nov 7, 2024
@david-allison
Copy link
Member Author

Priority high as this is a privacy issue. We should not be sending email addresses to ACRA

It would be ideal to get this into the 24.11 backend

@voczi
Copy link
Contributor

voczi commented Nov 7, 2024

For now, is it possible to add some middleware which filters that particular error? Other alternative is to modify (worst case drop) them periodically from the db..
Not nice that it's sending PII like that..

@david-allison
Copy link
Member Author

david-allison commented Nov 7, 2024

We need both:

  • Stop this being sent to ACRA
  • Blocking/scrubbing data: We must handle the case when a user doesn't upgrade AnkiDroid

As the message is translated, this makes it somewhat difficult.

We can likely filter these messages by checking whether the first line contains: [BackendSyncException , ankiweb.net, @]

We'd also want client-side code after this is patched to ensure that this server-side filter isn't scrubbing unknown/unhandled error messages

@david-allison
Copy link
Member Author

david-allison commented Nov 8, 2024

Diagnostics:

error.kind = 8 => SYNC_OTHER_ERROR

error.message = "Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d****@gmail.com to proceed."

https://github.com/ankidroid/Anki-Android-Backend/blob/c7c2842770c4cdd4616fd6d842375a171870f891/rsdroid/src/main/java/net/ankiweb/rsdroid/BackendException.kt#L107

Anki Backend:

https://github.com/ankitects/anki/blob/a150eda287a9b82841a367e2d675aac0acd50b62/rslib/src/sync/collection/status.rs#L44-L50

logcat

anki::syn...lectio..  D  redirect to new location endpoint="https://sync10.ankiweb.net/"
                      D  meta remote=SyncMeta { modified: TimestampMillis(1728774137848), schema: TimestampMillis(1718379983437), usn: Usn(6173), current_time: TimestampSecs(1731066728), server_message: "Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d***@gmail.com to proceed.", should_continue: false, host_number: 10, empty: false, media_usn: Usn(21646), v2_scheduler_or_later: false, v2_timezone: false }
                      D  meta local=SyncMeta { modified: TimestampMillis(0), schema: TimestampMillis(1731062594502), usn: Usn(0), current_time: TimestampSecs(1731066725), server_message: "", should_continue: true, host_number: 0, empty: true, media_usn: Usn(0), v2_scheduler_or_later: true, v2_timezone: true }
                      D  server says abort remote.server_message="Syncing failed, because your email address needs to be (re)confirmed. Please visit ankiweb.net, and log in as d***@gmail.com to proceed."
CollectionManager     W  blocked main thread for 960ms:
                         com.ichi2.anki.DeckPicker.updateDeckList(DeckPicker.kt:2059)
DeckPicker            D  updateDeckList
DeckPicke...DeckList  D  Refreshing deck list
Choreographer         I  Skipped 57 frames!  The application may be doing too much work on its main thread.
DeckPicker            D  onCreateOptionsMenu()
BadgeDrawableBuilder  D  Adding badge
DeckPicker            I  Updating deck list UI
                      D  Not rendering deck list as there are no cards
                      D  Startup - Deck List UI Completed
BadgeDrawableBuilder  D  Adding badge

@mikehardy
Copy link
Member

mikehardy commented Nov 8, 2024

Note that while in there special casing this sync exception from "other", the "device time out of sync" error should also be special cased (yes this increases scope slightly here but it is a trivial amount of extra work to do both at once) - this would allow us to provide user feedback well for that specific error - see comment on PR here for motivation / details: #17017 (review)


For Acrarium, I'm trying to conjure the correct SQL query to select only those reports that have this exception and no other but it is a little difficult since the exception is localized. Here is my progress so far:

1- log in, if you have access - ssh [email protected]
2- enter database, if you have access mysql -u acrarium -p acrarium
3- query for the 3 areas where the PII landed:

  • stacktrace table: select * from stacktrace where stacktrace like '%BackendSyncException%' and stacktrace like '%net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and regexp_like(stacktrace, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
  • report table: from report where content like '%BackendSyncException%' and content like '% "STACK_TRACE": "net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and regexp_like(content, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
  • bug table: select title from bug where title like '%BackendSyncException%' and regexp_like(title, ' [^[:blank:]]+@[^[:blank:]^\.]+\.[[:alnum:]]+ ');
    4- purge these entries (we're not interested in them anyway) - it turns out that if you purge the bug entry (and you can find them all by finding all the report table entries that match then following bug -> stacktrace.bug_id -> report.stacktrace_id) it will trigger deletes cascading from bug to stacktrace to report and then searching for email addresses will yield empty sets everywhere

This query took approximately 20 secs to run, so perhaps running it once an hour or so is about right?

delete from acrarium.bug
where id in
  (select bug_id from acrarium.stacktrace
   where id in
     (select stacktrace_id from acrarium.report
      where content like '%BackendSyncException%' and
      content like '%
net.ankiweb.rsdroid.exceptions.BackendSyncException: %' and
      regexp_like(content, ' [^[:blank:]]+@[^[:blank:]^\.][\.[:alpha:]+]+ ')));

Note that there are multiple messages that have the email, not just (re)confirm email also terms and conditions:

net.ankiweb.rsdroid.exceptions.BackendSyncException: Syncing failed. Please visit ankiweb.net and log in as [email protected] to confirm the updated terms and conditions.

@mikehardy
Copy link
Member

I've done the Acrarium part.

I can't really stop the reports from hitting the DB without hacking on acrarium which is problematic for a number of reasons - not least of which being that current main doesn't even build over there and is a big upgrade code-wise from what we're running now including a database migration

but!

I'm very comfortable that I can locate these entries and delete them (as documented above) and I've set up an hourly in-mysql event to purge them so they will never stay for long

mysql> select event_name,event_body,event_type,interval_value,interval_field,event_definition from information_schema.events;
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EVENT_NAME          | EVENT_BODY | EVENT_TYPE | INTERVAL_VALUE | INTERVAL_FIELD | EVENT_DEFINITION                                                                                                                                                                                                                                                                                                                          |
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| purge_email_reports | SQL        | RECURRING  | 1              | HOUR           | delete from bug
where id in
  (select bug_id from stacktrace
   where id in
     (select stacktrace_id from report
      where content like '%BackendSyncException%'
      and content like '% "STACK_TRACE": "net.ankiweb.rsdroid.exceptions.BackendSyncExc
eption: %'
      and regexp_like(content, ' [^[:blank:]]+@[^[:blank:]]+ '))) |
+---------------------+------------+------------+----------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@mikehardy
Copy link
Member

Out of distrust I checked in on my periodic acrarium purge this morning and it was of course not executing correctly.
I needed to prefix table names with schemas in the event scheduler context, and that fixed it.
Took the opportunity to clean up the matching regex a little as well.
Appears to be executing correctly from event scheduler now and verified acrarium is free of PII at the moment

@voczi
Copy link
Contributor

voczi commented Nov 9, 2024

Considering that user privacy was compromised by this incident, someone should do the same work (or at the very least thoroughly review everything you've set up so far) and make sure that they arrive at the same outcome as you did (i.e. acrarium is PII free at this moment, and in the future). It's not generally good to do the same work over again, but this should be a one-in-a-lifetime thing, and if the result is that "ok, we've now really made sure PII isn't stored" then I don't see how it's a waste of time.

@mikehardy
Copy link
Member

Good point @voczi - I've done my best but still made an error first time around. Happy to help anyone that wants to get access to the mysql instance on the ankidroid.org server, especially since it really shouldn't have PII on there. Or if you want to do it I'd welcome the extra look and I know you've got access.

@voczi
Copy link
Contributor

voczi commented Nov 10, 2024

Taking a look now.

@voczi
Copy link
Contributor

voczi commented Nov 10, 2024

Can confirm we still have issues.
Image
(report from 2024-11-05, task last executed 1 hr ago)
Gonna brainstorm with Mike.

@voczi
Copy link
Contributor

voczi commented Nov 10, 2024

I've wrote the following constraints (and also purged rows matching these beforehand):
(not((`content` like _utf8mb4\'%and log in as%\'))) (report table)
(not((`title` like _utf8mb4\'%and log in as%\'))) (bug table)
(not((`stacktrace` like _utf8mb4\'%and log in as%\'))) (stacktrace table)
Now, all of these messages should be gone. Tested with two reports on the latest build, one crash report with a test exception from the dev menu and the other with this message where it says my email has to be verified. And, also, I don't think we need the event anymore?

@mikehardy
Copy link
Member

I hadn't thought of adding a constraint, that's way better, obvious in hindsight, big improvement

The only issue with a constraint is that it means the report will get stuck on the client and be retried until deleted I think
All reports are cleared on version upgrades, so that is self-healing I think - and a reasonable tradeoff vs having PII flying around

I double-checked and I don't see any more PII anywhere, so with the constraints in place yes I can disable the event - I have done so

@mikehardy
Copy link
Member

moving this to 2.20 milestone as the constraints are blocking things now, and we have an exception subclass but it's for next version of anki upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants