Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAuth2 Token refresh implemented #328

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mesemus
Copy link
Contributor

@mesemus mesemus commented May 14, 2024

Description

This pull request fixes bug #68 - handling OAUth2 refresh

Added fields on RemoteToken model:

  • refresh_token
  • expires_at

Added properties on RemoteToken:

  • is_expired

New methods on RemoteToken:

  • refresh_access_token

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@mesemus mesemus force-pushed the store-refresh-token branch 3 times, most recently from 6b61916 to 00a3663 Compare May 14, 2024 12:41
@mesemus mesemus force-pushed the store-refresh-token branch from 00a3663 to e8be1a8 Compare May 14, 2024 12:44
"""
if expires_in is None:
return None
return datetime.datetime.now() + datetime.timedelta(seconds=expires_in)
Copy link
Member

@Samk13 Samk13 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why not use datetime.now(timezone.utc) instead of datetime.now() to ensure all datetime objects are timezone-aware for better consistency and clarity?
(utcnow() will be deprecated in 3.12)

Copy link
Contributor Author

@mesemus mesemus May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems that sqlalchemy removes the timezone even with db.DateTime(timezone=True), so that after db commit the expires_at is not timezone aware. Would you have a better solution than :

expiration_with_leeway = self.expires_at - timedelta(seconds=leeway)  <-- not timezone aware after db.commit
expiration_with_leeway = expiration_with_leeway.replace(tzinfo=timezone.utc)  <-- does not look good
return expiration_with_leeway < datetime.now(timezone.utc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if you changed sa.Column("expires_at", sa.DateTime(timezone=True), nullable=True) and SQLAlchemy still removes the timezone information despite the change, the problem could be relying somewhere else right?
WDYT @slint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the "aware" version just to see the test results for different databases.

It seems that both sqlite and mysql (https://dev.mysql.com/doc/refman/8.0/en/datetime.html) convert timestamp to utc before storing to db and then to local timezone upon retrieval - but "awareness" is lost in sqlalchemy.
Only postgres stores the timezone with the datetime ("timestamp with time zone" type) and subsequently sqlalchemy returns timezone aware datetime.

As invenio sources use utcnow() in many places (grep gave me more that 100) and even the maintrunk of sqlalchemy_utils uses utcnow(), I'd vote for keeping the datetime naive but in utc, using utcnow in the code. Later on when the invenio codebase will be upgraded to python 3.12++, then if not already solved by sqlalchemy itself we might introduce our own timestamp decorator in invenio-db (similar to https://mike.depalatis.net/blog/sqlalchemy-timestamps.html)

@Samk13 @slint What do you think?

Copy link
Member

@Samk13 Samk13 May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sa.DateTime(timezone=True) strips timezone info, the issue is deeper.
I would vote to stick with utcnow() for now and address this in Python 3.12 upgrades.
@slint, I recommend this. Your call. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to utcnow in code


access_token, refresh_token, secret, expires_at = refresh_access_token(self)
self.update_token(access_token, refresh_token, secret, expires_at)
db.session.commit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:
Should we use unit of work here? See: docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make invenio-oauthclient dependent on invenio-records-resources (where the UnitOfWork is defined). I would personally stick with simple commit as the rest of the library uses that, but would document it in the pydoc to make sure that the caller knows that commit will occur - does it make sense?

Copy link
Member

@Samk13 Samk13 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, but I could consider using db.session.begin_nested() to allow for rollbacks on errors and maintain consistency with the existing codebase.
Do you think this is a better approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better, but I still have a small issue with that :) The problem is that when the oauth token endpoint is called, it invalidates the previous access & refresh token and returns a new pair (that is at least the case for our perun aai implementation). Then, if the new access & refresh is not stored to database (for example, the refresh method is invoked from an external begin_nested which rolls back afterwards for some reason), the original values stored in remote_token are completely unusable and any subsequent call will fail. That's why I would rather commit the token as soon as possible, or if using the begin_nested I would at least document that the caller should commit it as soon as possible - what would you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, it will just introduce complexity regarding token management and state consistency.
Let's commit the token immediately after the update to ensure it's saved, avoiding potential issues with nested transactions and token invalidation. This keeps token management straightforward.
would you agree on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants