Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DataStore): should not crash on missing version metadata #2849

Merged
merged 4 commits into from
Jun 17, 2024

Conversation

lawmicha
Copy link
Contributor

@lawmicha lawmicha commented Jun 12, 2024

  • PR title and description conform to Pull Request guidelines.

Issue #, if available:
#2151 (comment)

Description of changes:

The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response.

This PR is looking to address the following crash that customers are experiencing

DataStoreException{message=Wanted 1 metadata for item with id = e7eb4237-c0d4-4ce6-8800-3b9d5fb603b8, but had 0., cause=null, recoverySuggestion=This is likely a bug. please report to AWS.}
     at com.amplifyframework.datastore.syncengine.VersionRepository.extractVersion(VersionRepository.java:91)
    at com.amplifyframework.datastore.syncengine.VersionRepository.lambda$findModelVersion$0$com-amplifyframework-datastore-syncengine-VersionRepository(VersionRepository.java:64)
     at com.amplifyframework.datastore.syncengine.VersionRepository$$ExternalSyntheticLambda0.accept(Unknown Source:8)
     at com.amplifyframework.datastore.storage.sqlite.SQLiteStorageAdapter.lambda$query$4$com-amplifyframework-datastore-storage-sqlite-SQLiteStorageAdapter(SQLiteStorageAdapter.java:407)
     at com.amplifyframework.datastore.storage.sqlite.SQLiteStorageAdapter$$ExternalSyntheticLambda6.run(Unknown Source:10)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:463)
     at java.util.concurrent.FutureTask.run(FutureTask.java:264)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:637)
     at java.lang.Thread.run(Thread.java:1012)

The message

Wanted 1 metadata for item with id = e7eb4237-c0d4-4ce6-8800-3b9d5fb603b8, but had 0.

comes from https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/VersionRepository.kt#L148

Code paths

MutationProcessor.publishToNetwork → 
    MutationProcessor.update → 
        VersionRepository.findModelVersion → 
            VersionRepository.extractVersion

MutationProcessor.publishToNetwork → 
   MutationProcessor.delete → 
        VersionRepository.findModelVersion → 
            VersionRepository.extractVersion

The unexpected behavior observed by the customer is that DataStore is deciding to crash when publishing mutations requires a version for updates and deletes, but it could not find it.

According to this customer #2151 (comment) This happens in bad network scenarios during saving and updating a model.

The cause of this issue:

  1. DataStore.save(model) is called for a new model. This inserts into the local database and queues up a pending create mutation.
  2. DataStore.save(model) is called for the existing model, performing some updates to the fields. This updates the existing record in the local database, and queues up a pending update mutation. We assume that the MutationProcessor performs (1) process the create mutation, such that it is marked in flight, and the update mutation is not merged into the create mutation when it is inserted into the mutation database, and has to be sent on its own by the MutationProcessor.

MutationProcessor (running in parallel):

  1. Process create mutation by publishing it to the network. The network is down. The request times out and fails. MutationProcessor can handle this in two ways: retry indefinitely or drop the mutation event.
    1. Retrying indefinitely will only occur when it accurately classifies the response as a transient network failure.
    2. If the response is classified as a non-retryable error, the error is returned to the customer on the errorHandler and the mutation will be dropped. Subsequently there will not be any reconciliation, and no local version will be persisted.
  2. Process update mutation. Query the local metadata table for the model’s version, cannot find it, throw exception which crashes the app.

The cause of this issue is the create mutation was dropped, which could happen for a variety of reasons decided by the MutationProcessor’s logic in handling the response of the request, and the subsequent update mutation did not have a version to send with the mutation.

MutationProcessor error classification

https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L353-L355

nonRetryableExceptions.add(DataStoreException.GraphQLResponseException.class);
nonRetryableExceptions.add(ApiException.NonRetryableException.class);

DataStoreException.GraphQLResponseException

https://github.com/aws-amplify/amplify-android/blob/main/aws-datastore/src/main/java/com/amplifyframework/datastore/syncengine/MutationProcessor.java#L387

This exception is thrown when the response is successful with a GraphQL response payload containing errors. If any of the GraphQLError’s do not contain the AppSync “conflictUnhandled” type then throw this.

This means the conditions for classification of non-retryable is as follows:

  1. A successful GraphQL response containing errors
  2. The error is not ConflictUnhandled, such as
    • ConditionalCheckFailedException
    • OperationDisabled
    • Unauthorized

Since we are discussing the scenario with bad network conditions, it’s unlikely that it was classified as non-retyable due to this condition since (1) A successful GraphQL response would not have occurred.

ApiException.NonRetryableException
This exception is thrown when the response is 400 to 499.

private static final int START_OF_CLIENT_ERROR_CODE = 400;
private static final int END_OF_CLIENT_ERROR_CODE = 499;

/// ...
if (response.code() >= START_OF_CLIENT_ERROR_CODE && response.code() <= END_OF_CLIENT_ERROR_CODE) {
    onFailure.accept(new ApiException
            .NonRetryableException("OkHttp client request failed.", "Irrecoverable error")
    );
    return;
}

This includes 408 Request Timeout. To be verified by manual testing, but this could be the response being handled as non-retryable even though it was a transient timeout error (bad network conditions).

In summary, there is an existing scenario to be verified: During bad network conditions, DataStore’s MutationProcessor should handle failures as retryable.

Consider the scenario in which Unauthorized is treated as non-retryable (as seen above in DataStoreException.GraphQLResponseException). Unauthorized is very common when the user is signed out, or their token or session is expired. The mutation is then dropped, and update mutation will attempt to get the version and crash the app.

This PR looks to address this scenario while another PR independent from this will look into verifying MutationProcessor's classification of transient network errors as retryable. The app should not crash when DataStore cannot retrieve the version for update and delete mutations. DataStore should attempt to send the mutation without a version and handle the response. The response will be handled by DataStore by sending it back to the customer through the errorHandler and then dropping it, moving on to the next mutation in the queue.

What kind of response are we expecting?

The conflict resolution enabled backend does not enforce version to be a required field. If we cannot get a version locally, we can send the update/delete mutation without a version, AppSync will respond successfully with a GraphQL response. This is inline with Swift DataStore's implementation, as it will also send update/delete mutations without a version if it cannot retrieve it at the time it's processing the mutation event.

Example 1. Update Mutation on existing model

mutation UpdateWithNoVersionHelloUpdated {
  updateBlog(input: {id: "f92dfb85-c580-4c3e-9eb4-93e3b76c33b5", name: "hello updated"}) {
    _deleted
    _lastChangedAt
    _version
    createdAt
    id
    name
    owner
    updatedAt
  }
}

// response
{
  "data": {
    "updateBlog": {
      "_deleted": null,
      "_lastChangedAt": 1718203981917,
      "_version": 6,
      "createdAt": "2024-06-12T14:49:26.329Z",
      "id": "f92dfb85-c580-4c3e-9eb4-93e3b76c33b5",
      "name": "hello",
      "owner": "michael",
      "updatedAt": "2024-06-12T14:49:26.329Z"
    }
  }
}

In the above example, version is updated, and the name is not. The conflict resolution strategy is enabled is auto-merge. DataStore will then reconcile this response data into the local database as the latest version of this model.

Example 2: Update mutation on model not yet created in AppSync

mutation UpdateInvalidIdWithNoVersion {
  updateBlog(input: {id: "notYetCreated", name: "hello updated"}) {
    _deleted
    _lastChangedAt
    _version
    createdAt
    id
    name
    owner
    updatedAt
  }
}
// response
{
  "data": {
    "updateBlog": null
  },
  "errors": [
    {
      "path": [
        "updateBlog"
      ],
      "data": null,
      "errorType": "Unauthorized",
      "errorInfo": null,
      "locations": [
        {
          "line": 27,
          "column": 3,
          "sourceName": null
        }
      ],
      "message": "Not Authorized to access updateBlog on type Mutation"
    }
  ]
}

The example above closely resembles the scenario where we continue to send the update mutation without a version. When the create mutation fails, the blog id does not exist in the AppSync. When the id is used for update mutations w/ Cognito User Pool auth, and without version, the response will be unauthorized.

According to the MutationProcessor logic, this will be categorized as a non retryable error, and the error will be returned to the customer through the errorHandler, before the update mutation is dropped. Based on their use cases, they can decide to reconcile this themselves or not, however, new devices or cleared devices will not sync this data back down to the app since it does not exist in the remote store.

How did you test these changes?
(Please add a line here how the changes were tested)

Documentation update required?

  • No
  • Yes (Please include a PR link for the documentation update)

General Checklist

  • Added Unit Tests
  • Added Integration Tests
  • Security oriented best practices and standards are followed (e.g. using input sanitization, principle of least privilege, etc)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@lawmicha lawmicha requested a review from a team as a code owner June 12, 2024 22:11
Copy link
Member

@tylerjroach tylerjroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the behavior of what happens if:

AppSync has a model that is on version 5.
If we attempt to update that model and don't pass a version, would it actually update?

Or is a null version passed to appsync side only going to insert as a v1 version if the model doesn't already exist.

@lawmicha lawmicha enabled auto-merge (squash) June 17, 2024 13:52
@lawmicha lawmicha merged commit 04178b1 into main Jun 17, 2024
3 checks passed
@lawmicha lawmicha deleted the fix.datastore-version-crash branch June 17, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants