Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync client eventloop should not crash the process #6958

Open
nirinchev opened this issue Sep 7, 2023 · 8 comments
Open

Sync client eventloop should not crash the process #6958

nirinchev opened this issue Sep 7, 2023 · 8 comments

Comments

@nirinchev
Copy link
Member

Currently, if an exception occurs in the sync client event loop, the process is terminated with very little - if any - information about what went wrong. Instead, the event loop should be structured in a way where operations being executed are tied to a Realm/session and any exceptions are logged, reported as fatal errors through the session error handler, after which sync stops for this realm/session, but remains active for any other sessions.

A very common example of the current broken behavior is when multiple processes try to open the same Realm file and get MultipleSyncAgents error. When this happens on Windows, the process crashes with no information about the cause of the problem or how to fix it.

This is the second issue that manifested in this bug report: realm/realm-dotnet#3437.

@tgoyne
Copy link
Member

tgoyne commented Sep 7, 2023

Errors should be reported to SyncClientConfig::default_socket_provider_thread_observer.

@tgoyne
Copy link
Member

tgoyne commented Sep 7, 2023

Specifically catching MultipleSyncAgents and reporting it as a session error should be pretty easy to do and makes sense. The sync client mostly doesn't use exceptions to report recoverable errors, but that's an exception.

Copy link

sync-by-unito bot commented Nov 20, 2023

➤ Jonathan Reams commented:

Seems like this ticket is really about making it easier to consume MultipleSyncAgents exceptions. We could make the sync client pass an error to the sync error handler with a MultipleSyncAgents Status instead of throwing a MultipleSyncAgents exception if that would make things easier. Otherwise I don't think we can ever guarantee that the event loop will never crash the process. The current behavior is not broken, it's just not what you would prefer.

@sync-by-unito sync-by-unito bot added the Waiting-For-Reporter Waiting for more information from the reporter before we can proceed label Nov 20, 2023
@nirinchev
Copy link
Member Author

I don't imagine we can ever guarantee 100% crash-free operation, though I'd argue that in the event of a crash being required, we should provide meaningful information to users. On a lot of the platforms we support, a sync client crash results in a message like (process 26752) exited with code -1073740791. This does not help users understand the cause of the problem or us with supporting them. While I would definitely prefer that the client never crashes the application, if that's not possible, is there at least a way for us to log/report the reason for the crash, along with as much extra information as we can collect?

@github-actions github-actions bot added Needs-Attention Reporter has responded. Review comment. and removed Waiting-For-Reporter Waiting for more information from the reporter before we can proceed labels Nov 20, 2023
Copy link

sync-by-unito bot commented Nov 20, 2023

➤ Jonathan Reams commented:

Beyond the current logging and setting the thread observer so you get access to uncaught exceptions?

@nirinchev
Copy link
Member Author

I'm not sure if additional logging has been added since the user reported the issue in September, but back then, Core/Sync weren't logging the crash reason by default (per user's screenshot). Is there something the SDK should set to enable logging of crash reasons?

Regarding the thread observer, I'm assuming you're referring to

struct BindingCallbackThreadObserver {
? I was not familiar with its existence, but reading through the code comments in the file, it seems like it might be just what we need. Is the intention that each SDK provides a custom implementation where at least handle_error is implemented in a way that this error is propagated to consumers before we crash the process? If that is indeed the case, then I think that should be sufficient and I can go ahead and create tasks for the SDKs to go ahead and do this. Is there a mechanism we can hook into to test that as it seems quite important that it's implemented correctly. I tried grepping for it, but couldn't find tests for the class I could use as inspiration, but it's possible I'm just not looking for the right thing.

Copy link

sync-by-unito bot commented Nov 20, 2023

➤ Jonathan Reams commented:

Currently the only implementation outside the Java SDK is in the C API - you can look at the tests for that implementation here. Just an FYI, when you turn this on you'll lose the stacktrace of any thrown exceptions that would have been printed to stderr before and you'll just get the exception error code/message.

Also, if/when you implement platform networking you won't have to do this because you can control the event loop and catch any exceptions you want yourself.

Finally, do you also want the MultipleSyncAgents exception to be passed to the sync session error handler rather than being thrown? It's probably the exception that you have to actually catch as an exception from the sync client right now.

@nirinchev
Copy link
Member Author

I think for MultipleSyncAgents, it makes sense for Core to handle it and report it through the sync client error handler, although my understanding is that this will go away once multiprocess sync is supported, so not sure if it's worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants