-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Amplify Auth v1 to v2 migration fails 5-10% of the time, logs user out #2929
Comments
Sorry to hear you're having issues @camhart. Can you please confirm that you updated directly to Is reinstalling the app the only solution? What about calling Are there any obvious similarities between the affected users? |
Yes, we went direct from v1 to v2.21.1.
I haven't tried this, but didn't think it would be needed. The SDK is supposed to detect when credentials are expired and handle refreshing them automatically isn't it? |
That's correct, it should - I only suggested trying to force refresh the tokens as a way to gather more information about what is going wrong. Another thought is to try catching the exception and invoking We will need to investigate this issue to see what's going on - unfortunately it sounds like it will be difficult to reproduce. Any additional details about the affected users would be beneficial. |
Ideally you can add more tools to the library so I can better troubleshoot the issue to provide more info. I'm confident if I release the app to another 1% of my customers, I'll get a few emails about it. But I don't want to do that until there's some ability to troubleshoot. We need some sort of migration record to indicate what happened to the migration and to understand why it failed. I'm not asking for you to solve it immediately. But adding some support for better troubleshooting migration issues seems like a low hanging fruit that moves the needle forward. |
@camhart Can you please share the code snippets so that we can try to reproduce the issue in a local environment.. Snippets of how Auth category is being used from from both V1 and V2 will be really helpful to isolate how we investigate the issue. |
@harsh62 I don't have code snippets to share that can reproduce the issue. I've tried multiple times with my entire app to replicate the problem and can't replicate it locally, but it is happening. This is why I'm arguing for better tools to investigate/troubleshoot problems relating to the migration. Here are all the Amplify method calls I use:
V1 used the same method calls but adjusted for the api changes between the two. I don't use Amplify for anything else--only Auth.
My app is a long running background app that stays running 24/7 in the background on the device (it's a parental control app). It automatically launches itself after an app update has occurred. |
Are you able to isolate if the issue is happening with customers using If you could answer this, it would greatly narrow down our reproduction codepath. |
Not easily. If the problem is happening to customers logged in via one of those calls, it's not happening 100% of the time. I can release the app to another 1% of customers and wait for the support tickets to come in, but I'm really hoping to avoid doing that without having better tools in place to troubleshoot the migration.
They can use one or the other, but not both. Once logged in one way, we don't give them the option to login again without signing out first.
We don't give customers the ability to logout once the device is setup (there's additional steps they have to take after logging in to set the device up with my app). There's only a very brief window where they can logout where the customer has logged in but not setup the device. Once the device is setup, if they want to logout they need to uninstall/reinstall the app. The customers who've reported the issue to me have all had their device setup fully, so there is no longer an option for them to logout at that point. So, long story short, it's not possible for them to use Amplify.Auth.signIn and then use Amplify.Auth.signInWithWebUI (or vice versa). Does that make sense? |
@camhart This is good information. Another question I have is that has your From the issues reported, are you able to see if anything common in the affected users, device types, OS versions, manufacturer type, or anything else? |
No it hasn't changed.
I haven't kept track of this. However, I do recall Samsung being one of the devices and it was on OS version 13. I have multiple samsung test devices though and I haven't been able to replicate the issue on any of them. When I release the app update to more customers, we get reports of customers having issues, but I can guarantee many have the issue but never report it. They'll just cancel their subscription with us or try and resolve it on their own. |
Thanks for providing all the information, one of our engineers will try to reproduce this issue locally by trying out different codepaths.. Will get back to you when we have more updates. |
@camhart One more question that would help in our research. Can you post all of the AWS dependencies you are using in Gradle? Ex Amplify as well as any other AWS SDKs. |
Those are the only dependencies being used. Let me know if you need anything else! |
What is the purpose of the Do you have device tracking enabled on Cognito side? I'm just trying to think of any additional areas we would need to look into. With this seemingly being an edge case type of scenario you are running into, please keep us updated if you notice any patterns (ex: sign ins over a year, etc). |
No it's not. It's left over from the refactor I made previously and can be removed without impacting anything.
It's likely set to the default. How do I check? I don't believe I intentionally use it in anyway (outside what the SDK would do by default).
So far today the trend has continued to hold true--all impacted devices have been for long term customers of ours created at least a year or more ago. I'm assuming these are all long term logins as a result of that. I don't keep track of the logins though, so I can't tell you for certain. I'll be sure to let you know if I find anything contrary to this trend, but based on the number of customers I've already worked with, I'm 95% confident the trend is going to hold. We have lots of new customers, so it'd be really strange at this point if it doesn't hold. |
@camhart What version of Amplify v1 were you using before bumping to v2. I'm slightly concerned about the API Gateway version of 2.16.1. v2.16.1 was released on October 2019. The last Amplify v1 release was using v2.73.0 of the AWS SDK, released in 2023. This means Earlier in this thread you stated
With this type of downgrade, this could have also been a major problem, as it looks like there were some keystore changes that had happened between 2019 and 2023. A CognitoCredentialsProvider from the 2019 may fail to read (and possibly corrupt) a keystore from 2023. With all this said, Amplify v2 attempts to open the old Amplify v1 / AWS SDK credentials, and migrate to the v2 format, without any dependency on the AWS Android SDK to do so. Since you are no longer using CognitoCredentialsProvider, I don't believe AWS Core and AWS Gateway have any codepaths that would attempt to write credentials in the old format (which would interfere with Amplify v2). Are you sure that these log out reports are from users that were actively using a version with Amplify v1, and upon recently being added to the rollout, begun having refresh token issues with Amplify v2. Is it possible that any of these reports are delayed? We know that the old implementation with CognitoCredentialsProvider (and MobileClient if it was present) would have corrupted the credential storage. Is it possible some of these customers are just now noticing? I know you said this issue was sporadic originally, but given what we know, I would have expected the initial implementation to fail 100% of the time. |
I do want to add--thank you for your help. I'm not trying to be a complainer here, but I do want to pass the pain that I'm feeling along so you have an appropriate understanding of the impact this troubleshooting experience has had. |
Hi @camhart, I understand your frustrations. Thank you for quickly answering all of the questions sent your way. I know it has been a lot, but these types of edge cases are always difficult to figure out with lack of logs that highlight the problem. It's especially hard considering our team members, and yourself, have been unable to replicate the failure. There could be something unique about these 5-10% of users that we haven't yet tracked down (ex: sign in method, device type, device OS, etc). We are continuing to look at any failure paths on our end.
I don't believe this would directly fix the problem, but it is always best to try and keep up to date with our latest versions. You are using a version of API Gateway that is 5 years old, which means it is missing 5 years of any bug fixes that would have possibly been added along the way. Given that you are confident the issues are happening with each rollout, and CognitoCachingCredentials provider is no longer being used, I do not expect this cause the invalid refresh token error you are seeing. |
Sounds good, I'll wait to hear further instruction from you then before trying anything. Getting this fixed is top priority on my end, so I'll respond quickly and as clearly as possible. |
@camhart If you wouldn't mind, join our discord channel https://discord.com/invite/amplify and you can reach out to me |
Just sent you a DM. |
Thanks, we can continue discussion there! |
I have identified an issue with migrating logins that have Device Tracking enabled. I am recognizing this ticket as a bug and we are actively working on a fix. This is not an issue with your hosted ui (web) sign ins, as they do not use device tracking. This will be an issue with any SRP sign ins that use device tracking. |
@camhart I have discovered the root cause and am working on a fix here: #2963 I believe we should be able to migrate the missing device metadata to our new credential store, which would result in token refreshes immediately working without requiring another sign in. The cause is due to aliased userIds. When email is used for signIn, the users actual userId is a UUID. During the migration process, Amplify v2 will attempt to migrate based on the email address, when it should be looking at the UUID userId instead. In my testing, I also identified a workaround. If you are not actually using Device Tracking (primarily used to prevent repeated MFA validations on sign in), I believe the issue can immediately be resolved by changing the "Remember User Devices" setting to "Don't Remember" in the Cognito console. This turns off the device tracking verification on token refreshes. The refresh calls that were failing would now succeed, because Cognito no longer checks the device metadata upon refresh. If you were to re-enable this setting, the refreshes would begin failing again until our official fix is released. TLDR: We are working on a fix, but if you don't actually need Device Tracking enabled for your use case, token refreshes will begin working again if you toggle "Remember User Devices" to "Don't Remember". |
Thank you for the update. Great news if we can migrate without causing people to have to sign in again. Is there any risk that changing the "Remember User Devices" setting could have an adverse effect that couldn't easily be undone by changing it back? The plan was to eventually offer MFA support. That's still the plan.
How long does a fix like this typically take to get released? A week? Three months? Thanks again! Really happy to finally get this figured out. |
I do not believe there is any risk in the change.
I don't see any adverse side effects in your case. MFA could still be enabled. Device Tracking is used as a way to bypass subsequent MFA requirements on future sign ins. Considering your app doesn't have Once a fix is merged and ready, it will typically go in the next release. We try and release weekly if there are commits ready to go live. |
I can confirm that disabling device tracking fixed the issue for one of our customers (hopefully all of them--time will tell). I realized I had it disabled already in my dev environment--that's why my testing didn't catch the issue when I did my own 24 hour tests. Thank you for all the help! Very much appreciated. |
Before opening, please confirm:
Language and Async Model
Java
Amplify Categories
Authentication
Gradle script dependencies
Environment information
Please include any relevant guides or documentation you're referencing
No response
Describe the bug
I've updated my Android app to use AWS Amplify V2. I deployed it to beta users, and ~5-10% of them had issues with the data migration. Essentially they ended up logged out of the app after their app updated and migrated from v1 to v2. This shouldn't happen. If I have those customers uninstall/reinstall the android app, and login, everything works moving forward, however this isn't an acceptable solution.
I created a ticket with AWS support and they told me to create a github issue. See case 172444220700816.
Here's an example log output when the app attempts to make API calls but is unable to due to being logged out.
D/ 09-23 15:31:15.551 BackendCallTask( 5715): AUTH fetchAuthSessionRequest
D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true
D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:16.732 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:16.732 System.err( 5715): at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48)
W/ 09-23 15:31:16.732 System.err( 5715): at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12)
W/ 09-23 15:31:16.733 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43)
W/ 09-23 15:31:16.733 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1)
D/ 09-23 15:31:28.709 BackendCallTask( 5715): AUTH fetchAuthSessionRequest
D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true
D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:28.963 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.}
W/ 09-23 15:31:28.963 System.err( 5715): at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48)
W/ 09-23 15:31:28.963 System.err( 5715): at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12)
W/ 09-23 15:31:28.963 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43)
W/ 09-23 15:31:28.963 System.err( 5715): at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1)
D/ 09-23 15:31:3
I'd like to request a feature addition to this library, where the migration creates persistent migration logs that the app developer can request to help troubleshoot issues like this. Also, it'd be able to be able to retry the migration. Right now it seems to destroy all the old v1 data and just assumes everything worked when it doesn't. The migration fails sporadically and I have no clue why, with no recourse for troubleshooting. I have to wait for a customer support ticket complaining about the problem in order to get logs, but they aren't really too helpful as they just show the user was signed out for some reason. I've been using aws amplify auth v1 for several years without any issue keeping users logged in.
Reproduction steps (if applicable)
I've been unable to reproduce the issue myself.
Code Snippet
// Put your code below this line.
Log output
amplifyconfiguration.json
{
"auth": {
"plugins": {
"awsCognitoAuthPlugin": {
"IdentityManager": {
"Default": {}
},
"CredentialsProvider": {
"CognitoIdentity": {
"Default": {
"PoolId": "us-west-2:xxxxxxxxxxxx",
"Region": "us-west-2"
}
}
},
"CognitoUserPool": {
"Default": {
"PoolId": "us-west-2_xxxxxxxxx",
"AppClientId": "xxxxxxxxx",
"AppClientSecret": "xxxxxxxxx",
"Region": "us-west-2"
}
},
"Auth": {
"Default": {
"OAuth": {
"WebDomain": "cognitoauth.xxxxxxxxx.io",
"AppClientId": "xxxxxxxx",
"AppClientSecret": "xxxxxxxxx",
"SignInRedirectURI": "xxxxxxxx://callback/",
"SignOutRedirectURI": "xxxxxxxx://signout/",
"Scopes": [
"email",
"openid",
"profile",
"aws.cognito.signin.user.admin"
]
},
"authenticationFlowType": "USER_SRP_AUTH"
}
}
}
}
}
}
GraphQL Schema
Additional information and screenshots
One more detail. V1 of the amplify auth library has code that Google Play throws big warnings about and claims it'll stop accepting app updates that use it. Fixing this issue with the v1 -> v2 migration should be a top priority, as continuing to use v1 in the interim isn't an option. I essentially can't update my app unless it's using amplify v2.
The text was updated successfully, but these errors were encountered: