Sync client eventloop should not crash the process #6958

nirinchev · 2023-09-07T19:19:30Z

Currently, if an exception occurs in the sync client event loop, the process is terminated with very little - if any - information about what went wrong. Instead, the event loop should be structured in a way where operations being executed are tied to a Realm/session and any exceptions are logged, reported as fatal errors through the session error handler, after which sync stops for this realm/session, but remains active for any other sessions.

A very common example of the current broken behavior is when multiple processes try to open the same Realm file and get MultipleSyncAgents error. When this happens on Windows, the process crashes with no information about the cause of the problem or how to fix it.

This is the second issue that manifested in this bug report: realm/realm-dotnet#3437.

The text was updated successfully, but these errors were encountered:

tgoyne · 2023-09-07T19:26:45Z

Errors should be reported to SyncClientConfig::default_socket_provider_thread_observer.

tgoyne · 2023-09-07T19:32:20Z

Specifically catching MultipleSyncAgents and reporting it as a session error should be pretty easy to do and makes sense. The sync client mostly doesn't use exceptions to report recoverable errors, but that's an exception.

sync-by-unito · 2023-11-20T18:06:55Z

➤ Jonathan Reams commented:

Seems like this ticket is really about making it easier to consume MultipleSyncAgents exceptions. We could make the sync client pass an error to the sync error handler with a MultipleSyncAgents Status instead of throwing a MultipleSyncAgents exception if that would make things easier. Otherwise I don't think we can ever guarantee that the event loop will never crash the process. The current behavior is not broken, it's just not what you would prefer.

nirinchev · 2023-11-20T18:18:37Z

I don't imagine we can ever guarantee 100% crash-free operation, though I'd argue that in the event of a crash being required, we should provide meaningful information to users. On a lot of the platforms we support, a sync client crash results in a message like (process 26752) exited with code -1073740791. This does not help users understand the cause of the problem or us with supporting them. While I would definitely prefer that the client never crashes the application, if that's not possible, is there at least a way for us to log/report the reason for the crash, along with as much extra information as we can collect?

sync-by-unito · 2023-11-20T18:21:05Z

➤ Jonathan Reams commented:

Beyond the current logging and setting the thread observer so you get access to uncaught exceptions?

nirinchev · 2023-11-20T18:57:17Z

I'm not sure if additional logging has been added since the user reported the issue in September, but back then, Core/Sync weren't logging the crash reason by default (per user's screenshot). Is there something the SDK should set to enable logging of crash reasons?

Regarding the thread observer, I'm assuming you're referring to

realm-core/src/realm/sync/binding_callback_thread_observer.hpp

Line 30 in 0620493

struct BindingCallbackThreadObserver {

? I was not familiar with its existence, but reading through the code comments in the file, it seems like it might be just what we need. Is the intention that each SDK provides a custom implementation where at least handle_error is implemented in a way that this error is propagated to consumers before we crash the process? If that is indeed the case, then I think that should be sufficient and I can go ahead and create tasks for the SDKs to go ahead and do this. Is there a mechanism we can hook into to test that as it seems quite important that it's implemented correctly. I tried grepping for it, but couldn't find tests for the class I could use as inspiration, but it's possible I'm just not looking for the right thing.

sync-by-unito · 2023-11-20T20:56:48Z

➤ Jonathan Reams commented:

Currently the only implementation outside the Java SDK is in the C API - you can look at the tests for that implementation here. Just an FYI, when you turn this on you'll lose the stacktrace of any thrown exceptions that would have been printed to stderr before and you'll just get the exception error code/message.

Also, if/when you implement platform networking you won't have to do this because you can control the event loop and catch any exceptions you want yourself.

Finally, do you also want the MultipleSyncAgents exception to be passed to the sync session error handler rather than being thrown? It's probably the exception that you have to actually catch as an exception from the sync client right now.

nirinchev · 2023-11-20T21:09:27Z

I think for MultipleSyncAgents, it makes sense for Core to handle it and report it through the sync client error handler, although my understanding is that this will go away once multiprocess sync is supported, so not sure if it's worth the effort.

nirinchev added T-Bug-Crash O-Customer labels Sep 7, 2023

sync-by-unito bot assigned jbreams Nov 20, 2023

sync-by-unito bot added the Waiting-For-Reporter Waiting for more information from the reporter before we can proceed label Nov 20, 2023

github-actions bot added Needs-Attention Reporter has responded. Review comment. and removed Waiting-For-Reporter Waiting for more information from the reporter before we can proceed labels Nov 20, 2023

sync-by-unito bot removed the Needs-Attention Reporter has responded. Review comment. label Nov 25, 2023

nielsenko mentioned this issue Mar 4, 2024

Issue while a app is on running on background via workmanager realm/realm-dart#1511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync client eventloop should not crash the process #6958

Sync client eventloop should not crash the process #6958

nirinchev commented Sep 7, 2023

tgoyne commented Sep 7, 2023

tgoyne commented Sep 7, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023

Sync client eventloop should not crash the process #6958

Sync client eventloop should not crash the process #6958

Comments

nirinchev commented Sep 7, 2023

tgoyne commented Sep 7, 2023

tgoyne commented Sep 7, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023

sync-by-unito bot commented Nov 20, 2023

nirinchev commented Nov 20, 2023