Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Exception: realm::KeyNotFound #3431

Closed
avolkangursel opened this issue Aug 30, 2023 · 10 comments
Closed

Fatal Exception: realm::KeyNotFound #3431

avolkangursel opened this issue Aug 30, 2023 · 10 comments
Assignees
Labels
O-Community T-Bug Waiting-For-Reporter Waiting for more information from the reporter before we can proceed

Comments

@avolkangursel
Copy link

avolkangursel commented Aug 30, 2023

What happened?

Hello, occasionally we receive crash reports like the one below from our live game. We obtain these reports through Firebase Crashlytics.

Fatal Exception: realm::KeyNotFound
No object with key '-1' in 'flx_subscription_sets'

It seems that the occurrence of this error is causing a crash in another thread. Details of the crashed thread are as follows:

Crashed: com.google.firebase.crashlytics.ios.exception
SIGABRT ABORT 0x00000001e15e9198
0 FirebaseCrashlytics 0x1eea8 FIRCLSProcessRecordAllThreads + 184
1 FirebaseCrashlytics 0x1f288 FIRCLSProcessRecordAllThreads + 1176
2 FirebaseCrashlytics 0x164dc FIRCLSHandler + 48
3 FirebaseCrashlytics 0x111e4 __FIRCLSExceptionRecord_block_invoke + 92
4 libdispatch.dylib 0x64780 _dispatch_client_callout + 16
5 libdispatch.dylib 0x46a50 _dispatch_lane_barrier_sync_invoke_and_complete + 52
6 FirebaseCrashlytics 0x101f0 FIRCLSExceptionRecord + 204
7 FirebaseCrashlytics 0xff00 FIRCLSTerminateHandler() + 588
8 libc++abi.dylib 0x13424 std::__terminate(void ()()) + 12
9 libc++abi.dylib 0x15da0 __cxa_get_exception_ptr + 30
10 libc++abi.dylib 0x15d60 __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception
) + 130
11 realm-wrappers 0x29d0f8 realm::ClusterNode::get(realm::ObjKey, realm::ClusterNode::State&) const + 300
12 realm-wrappers 0x3a734c realm::Table::get_object_with_primary_key(realm::Mixed) const + 180
13 realm-wrappers 0x260244 realm::sync::SubscriptionStore::get_mutable_by_version(long long) + 200
14 realm-wrappers 0x1e2ae4 realm::sync::SessionWrapper::on_download_completion() + 364
15 realm-wrappers 0x22fed4 realm::sync::ClientImpl::Session::receive_mark_message(unsigned long long) + 360
16 realm-wrappers 0x22c4e0 void realm::_impl::ClientProtocol::parse_message_receivedrealm::sync::ClientImpl::Connection(realm::sync::ClientImpl::Connection&, std::__1::basic_string_view<char, std::__1::char_traits>) + 248
17 realm-wrappers 0x236350 realm::sync::ClientImpl::Connection::WebSocketObserverShim::websocket_binary_message_received(realm::util::Span<char const, 18446744073709551615ul>) + 148
18 realm-wrappers 0x21ab64 (anonymous namespace)::WebSocket::frame_reader_loop() + 136
19 realm-wrappers 0x20dba0 realm::sync::network::Service::BasicStreamOpsrealm::sync::network::ssl::Stream::BufferedReadOper<realm::util::UniqueFunction<void (std::__1::error_code, unsigned long)>>::recycle_and_execute() + 220
20 realm-wrappers 0x2144d4 realm::sync::network::Service::Impl::run_impl(bool) + 388
21 realm-wrappers 0x2059f8 realm::sync::websocket::DefaultSocketProvider::event_loop() + 220
22 realm-wrappers 0x2076c4 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct>, void (realm::sync::websocket::DefaultSocketProvider::)(), realm::sync::websocket::DefaultSocketProvider>>(void*) + 64
23 libsystem_pthread.dylib 0x30ec _pthread_start + 116
24 libsystem_pthread.dylib 0x172c thread_start + 8

From the older version (realm-dotnet v11.0.0), these reports were coming from about 1.5-2% of the players. Currently, it is much less frequent (around 0.2-0.3%). However, it persists. How can we prevent this?

Unity version: 2021.3.26f1

Repro steps

Unfortunately, we are unable to capture it in our own tests. This error report doesn't occur very often, but it's affecting the smooth operation of our game.

Version

11.3.0

What Atlas Services are you using?

Both Atlas Device Sync and Atlas App Services

What type of application is this?

Unity

Client OS and version

iOS16.6.0

Code snippets

No response

Stacktrace of the exception/crash you're getting

No response

Relevant log output

No response

@nirinchev
Copy link
Member

@danieltabacaru from a quick scan of Core, this seems to be impossible since we explicitly check for -1 before calling get_mutable_by_version: https://github.com/realm/realm-core/blob/8b4b08d692625a03a8a8c24762dfde282665009b/src/realm/sync/client.cpp#L1740-L1743.

So this is either a race condition (where someone sets m_flx_pending_mark_version to -1) or another manifestation of realm/realm-core#6813 - what do you think? Is it possible that multiple threads call SessionWrapper::on_download_completion simultaneously?

@danieltabacaru
Copy link

@nirinchev I don't think it's possible that multiple threads call on_download_completion, let alone simultaneously. We've seen something similar in the past, but on a different code path (and it is fixed now). I will try to dig a bit deeper, but as always, some client logs would go a long way.

@avolkangursel
Copy link
Author

We have extracted the details of the error report from Firebase Crashlytics. I am sharing the error report details.
Crash report:
realm_KeyNotFound_crash_report.csv

Additionally, based on the creation date of the report, when we looked at the error logs in App Services, we identified a session that we believe could be related. We're not entirely sure if it's directly related, but I'm also sharing that session report.

Session details:

{"logs":[{"_id":"64ef14030f40c41bcb23a945","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_CONNECTION_END","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:47.943Z","completed":"2023-08-30T10:03:47.946Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","status":101,"messages":["Connection was active for: 3s"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020"},{"_id":"64ef14030f40c41bcb23a944","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_SESSION_END","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:47.943Z","completed":"2023-08-30T10:03:47.943Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","messages":["Session was closed by Sync Server","Session was active for: 1s"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"},"sync_session_metrics":{"uploads":1,"changesets":1}},{"_id":"64ef14030f40c41bcb23a13f","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_CLIENT_WRITE","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:46.916Z","completed":"2023-08-30T10:03:47.057Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","messages":["Upload message contains 1 changesets (total size 637.3 kB) to be integrated","Integrated 1 of 1 remaining changesets (total size 637.3 kB) (0 required conflict resolution) in 1 attempts to bring server version to 25183092"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"},"sync_write_summary":{"RealmPref":{"inserted":["d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.SaveData.SuperPowerSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_TutorialSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_FitTheBlocks.Store.StoreSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.Team.TeamSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.SaveDatas.RushEventSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.EndlessLeague.EndlessLeagueSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.DailyReward.DailyRewardSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.PlayerHealth.PlayerHealthSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.Leaderboard.LeaderboardSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.EventSystem.ProgressiveEvent.PlayerEventSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_PlayerValueSystemSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.Common.HintLoadingSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_AdSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.MetaScene.MetaSceneSaveDataSynced_prod","d723a6f6-266b-4b88-89be-43b0786fcb8f_Game.SaveData.GlobalSaveDataSynced_prod"]}}},{"_id":"64ef14020f40c41bcb239de5","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_SESSION_START","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:46.623Z","completed":"2023-08-30T10:03:46.641Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","messages":["Roles selected for sync:","   user: [ RealmPref ]"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"}},{"_id":"64ef14020f40c41bcb239a8c","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_SESSION_END","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:46.288Z","completed":"2023-08-30T10:03:46.288Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","messages":["Session closed after receiving UNBIND event","Session was active for: 0s"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"},"sync_session_metrics":{"downloads":2,"downloaded_changesets":2,"downloaded_changesets_size":66}},{"_id":"64ef14020f40c41bcb23990c","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_OTHER","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:46.085Z","completed":"2023-08-30T10:03:46.111Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","messages":["changing query from existing query to new query","Roles selected for sync:","   user: [ RealmPref ]"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"}},{"_id":"64ef14010f40c41bcb23961a","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_SESSION_START","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:45.723Z","completed":"2023-08-30T10:03:45.758Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{}},{"_id":"64ef14000f40c41bcb238dc7","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_SESSION_START","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:44.784Z","completed":"2023-08-30T10:03:44.786Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","error":"client sent incorrect file ident salt (1566239614975697207 != 5929118636970194382). This is likely due to using a synchronized realm after terminating and re-enabling sync. Please wipe the file on the client to resume synchronization. { sessionIdent: 1, clientFileIdent: 14556 } (ProtocolErrorCode=211)","error_code":"DivergingHistories","messages":["Roles selected for sync:","   user: [ RealmPref ]"],"platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020","sync_query":{"RealmPref":"(owner_id == \"d723a6f6-266b-4b88-89be-43b0786fcb8f\")"}},{"_id":"64ef14000f40c41bcb238b91","co_id":"64ef14000f40c41bcb238b86","type":"SYNC_CONNECTION_START","user_id":"64ef13ffcb1cc6eb86834833","domain_id":"648b15954d2ea9036199b5bb","app_id":"648b15954d2ea9036199b5ba","group_id":"64187f8ff7eff8439d0f865a","request_url":"/api/client/v2.0/app/lm-application-pqiww/realm-sync","request_method":"GET","remote_ip_address":"85.96.214.21","started":"2023-08-30T10:03:44.518Z","completed":"2023-08-30T10:03:44.518Z","function_call_location":"DE-FF","function_call_provider_region":"aws-eu-central-1","platform":"iOS","platform_version":"22.6.0.0","sdk_name":"Dotnet","sdk_version":"11.3.0","framework_name":"Unity","framework_version":"4.0.30319.17020"}]}

@danieltabacaru
Copy link

danieltabacaru commented Sep 7, 2023

We see a DivergingHistories error in both this and the crash ticket, so they may be related. As a result of DivergingHistories error the server requests a client reset. @avolkangursel Do you know if a reconnect happens when these errors are encountered?

@danieltabacaru
Copy link

We think the root cause of these errors may be realm/realm-core#6961.

@avolkangursel
Copy link
Author

We are using the ClientResetHandler, but only for logging purposes. I'm also sharing the FlexibleSyncConfiguration that we use for Realm files.

private static FlexibleSyncConfiguration CreateFlexibleSyncConfiguration(string path)
{
    return new FlexibleSyncConfiguration(User, path)
    {
        PopulateInitialSubscriptions = (realm) =>
        {
            var queryableRealmPrefs = realm.All<RealmPref>();
            realm.Subscriptions.Add(queryableRealmPrefs);
        },
        EncryptionKey = EncryptionKey,
        OnSessionError = (session, sessionException) =>
        {
            SDKLog.Error($"Error code : {sessionException.ErrorCode} - Message : {sessionException.Message}");
        },
        ClientResetHandler = new RecoverOrDiscardUnsyncedChangesHandler
        {
            OnBeforeReset = frozen =>
            {
                Debug.LogWarning("ClientReset -> OnBeforeReset has called.");
            },
            OnAfterRecovery = (frozen, after) =>
            {
                Debug.LogWarning("ClientReset -> OnAfterRecovery has called.");
            },
            OnAfterDiscard = (frozen, after) =>
            {
                Debug.LogWarning("ClientReset -> OnAfterDiscard has called.");
            },
            ManualResetFallback = exception =>
            {
                Debug.LogWarning("ClientReset -> ManualResetFallback has called.");
                SDKLog.Error(exception);
            }
        },
        
    };
}

In addition, we do not attempt to log in again or create a new instance of the Realm files as long as the JWT token has not expired. However, in some cases, we call SyncSession.Stop() and SyncSession.Start(), and this happens asynchronously. Unfortunately, I don't have any more information about the error.

What do we need to do to ensure that the situation you are thinking of never occurs?

@danieltabacaru
Copy link

The ticket I linked it's a bug on our side and we'll fix it. It's still not fully clear if that's causing these errors. It was discovered while investigating this (we are still looking into it).

@danieltabacaru
Copy link

danieltabacaru commented Sep 25, 2023

@avolkangursel As mentioned, we'll fix realm/realm-core#6961 which may cause this (and #3430). If that's the real root cause then there is nothing needed on your side. Some logs would really help here.

@sync-by-unito sync-by-unito bot assigned jbreams and unassigned danieltabacaru Nov 20, 2023
Copy link

sync-by-unito bot commented Jan 24, 2024

➤ Jonathan Reams commented:

I think this was fixed in realm/realm-core#7088 (core version 13.23.3). I think what's happening here is that you're getting a client reset which eventually resets your latest subscription set version but doesn't update the variable we use to track whether that subscription set needs to be marked as complete, then when we get a MARK message and try to update the non-existent subscription set as complete we get an error from the DB because the subscription set version no longer maps to any objects on disk.

If you've already upgraded to 13.23.3, does this still reproduce for you?

@sync-by-unito sync-by-unito bot added the Waiting-For-Reporter Waiting for more information from the reporter before we can proceed label Jan 24, 2024
@nirinchev
Copy link
Member

For context, the first version of the .NET SDK that pulls in Core 13.23.3+ is 11.6.1. @avolkangursel please try to update to that version and let us know if you still experience the problem.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
O-Community T-Bug Waiting-For-Reporter Waiting for more information from the reporter before we can proceed
Projects
None yet
Development

No branches or pull requests

4 participants