Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for Codespaces usage #42

Open
lucyb opened this issue Apr 12, 2024 · 5 comments · Fixed by ebmdatalab/metrics#173
Open

Add metrics for Codespaces usage #42

lucyb opened this issue Apr 12, 2024 · 5 comments · Fixed by ebmdatalab/metrics#173
Assignees
Milestone

Comments

@lucyb
Copy link

lucyb commented Apr 12, 2024

Based on the discovery in #8 .

We want to know how many Codespaces there are in the OpenSAFELY GitHub organisation.

This ticket is to:

  • Add some code into the Metrics repo to poll the GitHub API every hour.
  • Add a very basic graph in Grafana to display this data (i.e number of Codespaces over time)

If we can easily record additional information about the Codespaces, like Owner/Repo/State, that might be worth considering too.

The API call needed is something like:

gh api \                          
  -H "Accept: application/vnd.github+json" \  
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/opensafely/codespaces  

Some slack discussion in this 🧵 thread.

@iaindillingham
Copy link
Member

I'd like to read my interpretation of the issue back to you, @lucyb. I think we want to count the number of active Codespaces in the opensafely organization, every hour. I think we want to use this information to answer the question: "Are Codespaces being used within the opensafely organization?" Consequently, we'd be scanning a timeseries to determine whether the count was zero or whether the count was greater than zero. That is, how much greater than zero doesn't matter.

We don't want to know, for example, when each Codespace was created, suspended, and deleted, and hence know how long each Codespace was active. We don't want to be able, for example, to associate each Codespace with a repo and a range of commits.

@iaindillingham
Copy link
Member

The token needs the admin:org scope to use the endpoint in the API call. However, there's an example response in the docs.

@iaindillingham
Copy link
Member

@lucyb and I had a chat about this issue last week. We agreed that for each Codespace, Metrics should record:

  • the user (owner.login)
  • the repo (repository.name)
  • when it was created (created_at)
  • when it was last used (last_used_at)

Metrics should record these data on the current daily schedule. We appreciate that doing so will mean that Metrics will miss data for codespaces that are created and deleted within a day.

With these data, we will derive the number of users that are developing their study code in Codespaces, over time. We hope this number is non-zero (someone is using a Codespace 🤞🏻) and is similar to the rate at which new studies are approved, albeit with a lag. For example, if a new study is approved every week for four weeks, then we hope that the number of users that are developing their study code in Codespaces will (eventually) increase to four. Knowing the user and repo will help us in the observation stage of the initiative: We will know who to ask about what, when we want to know about the experience of developing study code in Codespaces.

It would be useful to derive the distribution of time deltas between when a Codespace was created and when it was last used, as the distribution could help us calibrate our usage policy. For example, if the peak of the distribution was consistently low, then we could infer an ephemeral pattern of use. The current maximum retention period of 14 days would appropriate. However, if the peak of the distribution approached 14 days, then we should reevaluate the current maximum retention period, or at least our communication of it, to prevent users from loosing their work.

It would be useful to derive the distribution of time deltas between when a repo was created and when the associated Codespace was last used. We think this distribution will have positive skew -- that is, a large number of small deltas -- as this would demonstrate that new study code is being edited in Codespaces. However, we're very interested in repos to the right of the distribution, as these would demonstrate that old study code is being edited in Codespaces. These studies may be larger, more complex, and depend on older versions of our tools, and may help us address any challenges associated with developing study code in Codespaces sooner rather than later.

@Jongmassey
Copy link

Just to note that one of our pilot users is using the template for a repo in their own github account not the opensafely org so will be missing from these stats.

Until the service is fully opened back up we might find more instances of researchers trying to get a head start on projects that are still in the approvals process.

Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 20, 2024
Spec of fields to extract from API response taken from discussion
in opensafely-core/codespaces-initiative#42
Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 20, 2024
Spec of fields to extract from API response taken from discussion
in opensafely-core/codespaces-initiative#42
Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 21, 2024
Spec of fields to extract from API response taken from discussion
in opensafely-core/codespaces-initiative#42
Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 30, 2024
Define a Codespace dataclass containing required fields (see discussion
in opensafely-core/codespaces-initiative#42).
Rather than use an instance of the existing Repo dataclass to store
repo data, we only need the name and we only receive a
minimal amount of repo data from the API so just store the name as a
string. This is hopefully less confusing than modifying the Repo class
or populating the extra fields this class requires with dummy data.

The organisation codespaces endpoint is queried and returned data is
passed unmodified to the Codespace dataclass's from_dict() method, which
does the required data conversion. This follows the pattern established
for the other domain dataclasses.
Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 30, 2024
Define a Codespace dataclass containing required fields (see discussion
in opensafely-core/codespaces-initiative#42).
Rather than use an instance of the existing Repo dataclass to store
repo data, we only need the name and we only receive a
minimal amount of repo data from the API so just store the name as a
string. This is hopefully less confusing than modifying the Repo class
or populating the extra fields this class requires with dummy data.

An additional PAT is required to query codespaces for the opensafely
GitHub organisation. Any future querying of codespaces for other
organisations will require similarly permissioned PATs.

The organisation codespaces endpoint is queried and returned data is
passed unmodified to the Codespace dataclass's from_dict() method, which
does the required data conversion. This follows the pattern established
for the other domain dataclasses.
Jongmassey added a commit to ebmdatalab/metrics that referenced this issue May 31, 2024
Define a Codespace dataclass containing required fields (see discussion
in opensafely-core/codespaces-initiative#42).
Rather than use an instance of the existing Repo dataclass to store
repo data, we only need the name and we only receive a
minimal amount of repo data from the API so just store the name as a
string. This is hopefully less confusing than modifying the Repo class
or populating the extra fields this class requires with dummy data.

An additional PAT is required to query codespaces for the opensafely
GitHub organisation. Any future querying of codespaces for other
organisations will require similarly permissioned PATs.

The organisation codespaces endpoint is queried and returned data is
passed unmodified to the Codespace dataclass's from_dict() method, which
does the required data conversion. This follows the pattern established
for the other domain dataclasses.
@Jongmassey
Copy link

Jongmassey commented Jun 3, 2024

@Jongmassey Jongmassey reopened this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants