Skip to content

Commit

Permalink
feat(github): GitHub App installation support
Browse files Browse the repository at this point in the history
There is this special kind of GitHub access tokens which belong to the
GitHub App installation (GitHub App installed in your GitHub
organization, doing stuff on it's own behalf, like various bots).
Compared to the more usual Personal Access Tokens, these tokens have
entirely different way of managing their permissions and getting them
from the GitHub API.

This set of changes introduces support for those tokens, which in turn
led to a couple of new design decisions and some refactoring, splitting
the original GithubIdentity into two subclasses, each dealing with their
own authentication/authorization scheme, while still leveraging the
common caching capabilities, now factored out to the parent class.

Each of the subclasses has an 'authenticate' class method, which creates
an instance of the particular class, when properly authenticated against
the GitHub API. This identity is then paired with its access token in a
cache for later reuse.

In turn, the instance has a private '_authorize' method which provides
means for obtaining the targeted org/repo permissons. These permissions
are supposed to be stored in a per-identity cache via the call to
'_set_permissions'. Due to the nature of the permission discovery for
the new GitHub App installations, multiple repo permissions (of a single
org) can be cached in a single repo. Also, when this identity has access
to all organization repos, a special org/None permission is used to
cover them all as a single entry.

There's also a slightly unrelated addition, which is a configurable
request timeout for the GitHub API calls, starting with some rather
benevolent defaults. This should bring more stability to the module, as
potentially hanging requests to the API won't hog the threads forever.

Here are some more important milestones in this squash:
* feat(github): extract username from the Basic authorization header

To be eventually used to identify an application id with an "app installation" github token.

* chore(github): move request session management under CallContext

This is needed for further design improvements where GithubIdentity objects will be able to call their specific GitHub API calls.

* chore(github): move CallContext to the module scope to hold a requests session

* feat(github): support GitHub App installation tokens

* chore(github): provide unit tests for GithubAppIdentity

* feat(github): use a timeout for GitHub API calls

* docs(github): GitHub App installation tokens

* feat(github): casually cache GitHub App repo permissions

Also: Somehow reflect on user caching preferences for the auth proxy cache, providing a hardcoded minimum for proxied entries.

GithubIdentity._set_permissions got a new flag 'casual' for writing to the main auth cache without any guarantee this will be retrieved later. This is required so casually cached permissions for GitHub Apps won't get stuck in the proxy cache.
  • Loading branch information
vit-zikmund committed Nov 19, 2024
1 parent a984aa0 commit f522793
Show file tree
Hide file tree
Showing 3 changed files with 1,122 additions and 215 deletions.
28 changes: 22 additions & 6 deletions docs/source/auth-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,26 +196,42 @@ servers.
This authenticator lets you provide a frictionless LFS backend for existing GitHub repositories. It plays nicely with `git` credential helpers and allows you to use GitHub as the single authentication & authorization provider.

### Details
The authenticator uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), the same ones used for cloning a GitHub repo over HTTPS. The provided token is used in a couple GitHub API calls that identify the token's identity and [its permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) for the GitHub organization & repository. The token is supposed to be passed in the password part of the `Basic` HTTP auth (username is ignored). `Bearer` token HTTP auth is also supported, although no git client will likely use it.
The authenticator uses GitHub [Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) and [App Installation tokens](https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/authenticating-as-a-github-app-installation), the same ones used for cloning a GitHub repo over HTTPS. The provided token is used in a couple GitHub API calls that identify the token's identity and [its permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) for the GitHub organization & repository.

For the authenticator to work properly the token must have the `read:org` for "Classic" or `metadata:read` permission for the fine-grained kind.

Note: Authentication via SSH that could be used to verify the user is [not possible with GitHub at the time of writing](https://github.com/datopian/giftless/issues/128#issuecomment-2037190728).
Note: Authentication via SSH that could be used to verify the user is [not possible with GitHub at the time of writing](https://github.com/datopian/giftless/issues/128#issuecomment-2037190728).

The GitHub repository permissions are mapped to [Giftless permissions](#permissions) in the straightforward sense that those able to write will be able to write, same with read; invalid tokens or identities with no repository access will get rejected.

To minimize the traffic to GitHub for each LFS action, most of the auth data is being temporarily cached in memory, which improves performance, but naturally also ignores immediate changes for identities with changed permissions.

### GitHub Auth Flow
Here's a description of the authentication & authorization flow. If any of these steps fails, the request gets rejected.
Here's a description of the authentication & authorization flow. If any of these steps fails, the request gets rejected. As the supported token flavors have a very different way of authentication, they're described separately:

#### Personal Access Tokens (`ghp_`, `_github_pat_` and likely other [token flavors](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/about-authentication-to-github#githubs-token-formats) `gho_`, `ghu_`)
These tokens eventually represent a real user. For the authenticator to work properly, the token must have these permissions:
- `read:org` for "Classic" or
- `metadata:read` for the fine-grained kind.
- The user has to be a collaborator of the target repository with an adequate role for reading or writing.

1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used (as usual) to determine what GitHub organization and repository is being targeted (e.g. `https://<server>/<org>/<repo>.git/info/lfs/...`). The request's `Authentication` header is also searched for the required GitHub personal access token.
1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used to determine what GitHub organization and repository is being targeted (e.g. `https://<server>/<org>/<repo>.git/info/lfs/...`). The request's `Authentication` header is searched for the required token in the `password` part of the `Basic` HTTP auth.
2. The token is then used in a [`/user`](https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#get-the-authenticated-user) GitHub API call to get its identity data.
3. Further on the GitHub API is asked for the [user's permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) to the org/repo in question.
4. Based on the information above the user will be granted or rejected access.

#### App Installation Tokens (`ghs_`)
This token represents a special identity of an "application installation", acting on behalf of an installed GitHub App (likely part of an automation integration). This installation is bound to a user or organization (owner) and gets a set of fine-grained permissions applying to `all` or `selected` repositories of the targeted owner. For the authenticator to work properly, the GitHub App must have these permissions:
- `metadata:read` (default)
- `contents:read|write` (the permission to the repository content)
- `organization_administration:read` (required to [list owner's app installations](https://docs.github.com/en/rest/orgs/orgs?apiVersion=2022-11-28#list-app-installations-for-an-organization))
- The installed App also has to have access to the target repository.

1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used to determine what GitHub organization and repository is being targeted (e.g. `https://<server>/<org>/<repo>.git/info/lfs/...`). The request's `Authentication` header is searched for the required token in the `password` part of the `Basic` HTTP auth. **The `user` part must contain some identification of the app** (installation `id`, `app_id`, `client_id` or `app_slug` (its dashed name)).
2. The token is then used in the [`/orgs/<org>/installations`](https://docs.github.com/en/rest/orgs/orgs?apiVersion=2022-11-28#list-app-installations-for-an-organization) GitHub API call to get the list of app installations in the target `org`. This list is then searched for the app identification from the `user` part above. The identified entry contains info about the app permissions and whether the installation targets `all` repositories or just `selected`. At this moment the LFS permissions are inferred from the provided `content` permission. If the repository access is `all`, this is everything the logic needs.
3. If the repository access is just for `selected` ones, the GitHub API is asked for the [`/installation/repositories`](https://docs.github.com/en/rest/apps/installations?apiVersion=2022-11-28#list-repositories-accessible-to-the-app-installation), where it must find the target repository.

### `giftless.auth.github` Configuration Options
* `api_url` (`str` = `"https://api.github.com"`): Base URL for the GitHub API (enterprise servers have API at `"https://<custom-hostname>/api/v3/"`).
* `api_timeout` (`float | tuple[float, float]` = `(10.0, 20.0)`): Timeout for the GitHub API calls ([details](https://requests.readthedocs.io/en/stable/user/advanced/#timeouts)).
* `api_version` (`str | None` = `"2022-11-28"`): Target GitHub API version; set to `None` to use GitHub's latest (rather experimental).
* `cache` (`dict`): Cache configuration section
* `token_max_size` (`int` = `32`): Max number of entries in the token -> user LRU cache. This cache holds the authentication data for a token. Evicted tokens will need to be re-authenticated.
Expand Down
Loading

0 comments on commit f522793

Please sign in to comment.