-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for Codespaces usage #42
Comments
I'd like to read my interpretation of the issue back to you, @lucyb. I think we want to count the number of active Codespaces in the opensafely organization, every hour. I think we want to use this information to answer the question: "Are Codespaces being used within the opensafely organization?" Consequently, we'd be scanning a timeseries to determine whether the count was zero or whether the count was greater than zero. That is, how much greater than zero doesn't matter. We don't want to know, for example, when each Codespace was created, suspended, and deleted, and hence know how long each Codespace was active. We don't want to be able, for example, to associate each Codespace with a repo and a range of commits. |
The token needs the |
@lucyb and I had a chat about this issue last week. We agreed that for each Codespace, Metrics should record:
Metrics should record these data on the current daily schedule. We appreciate that doing so will mean that Metrics will miss data for codespaces that are created and deleted within a day. With these data, we will derive the number of users that are developing their study code in Codespaces, over time. We hope this number is non-zero (someone is using a Codespace 🤞🏻) and is similar to the rate at which new studies are approved, albeit with a lag. For example, if a new study is approved every week for four weeks, then we hope that the number of users that are developing their study code in Codespaces will (eventually) increase to four. Knowing the user and repo will help us in the observation stage of the initiative: We will know who to ask about what, when we want to know about the experience of developing study code in Codespaces. It would be useful to derive the distribution of time deltas between when a Codespace was created and when it was last used, as the distribution could help us calibrate our usage policy. For example, if the peak of the distribution was consistently low, then we could infer an ephemeral pattern of use. The current maximum retention period of 14 days would appropriate. However, if the peak of the distribution approached 14 days, then we should reevaluate the current maximum retention period, or at least our communication of it, to prevent users from loosing their work. It would be useful to derive the distribution of time deltas between when a repo was created and when the associated Codespace was last used. We think this distribution will have positive skew -- that is, a large number of small deltas -- as this would demonstrate that new study code is being edited in Codespaces. However, we're very interested in repos to the right of the distribution, as these would demonstrate that old study code is being edited in Codespaces. These studies may be larger, more complex, and depend on older versions of our tools, and may help us address any challenges associated with developing study code in Codespaces sooner rather than later. |
Just to note that one of our pilot users is using the template for a repo in their own github account not the opensafely org so will be missing from these stats. Until the service is fully opened back up we might find more instances of researchers trying to get a head start on projects that are still in the approvals process. |
Spec of fields to extract from API response taken from discussion in opensafely-core/codespaces-initiative#42
Spec of fields to extract from API response taken from discussion in opensafely-core/codespaces-initiative#42
Spec of fields to extract from API response taken from discussion in opensafely-core/codespaces-initiative#42
Define a Codespace dataclass containing required fields (see discussion in opensafely-core/codespaces-initiative#42). Rather than use an instance of the existing Repo dataclass to store repo data, we only need the name and we only receive a minimal amount of repo data from the API so just store the name as a string. This is hopefully less confusing than modifying the Repo class or populating the extra fields this class requires with dummy data. The organisation codespaces endpoint is queried and returned data is passed unmodified to the Codespace dataclass's from_dict() method, which does the required data conversion. This follows the pattern established for the other domain dataclasses.
Define a Codespace dataclass containing required fields (see discussion in opensafely-core/codespaces-initiative#42). Rather than use an instance of the existing Repo dataclass to store repo data, we only need the name and we only receive a minimal amount of repo data from the API so just store the name as a string. This is hopefully less confusing than modifying the Repo class or populating the extra fields this class requires with dummy data. An additional PAT is required to query codespaces for the opensafely GitHub organisation. Any future querying of codespaces for other organisations will require similarly permissioned PATs. The organisation codespaces endpoint is queried and returned data is passed unmodified to the Codespace dataclass's from_dict() method, which does the required data conversion. This follows the pattern established for the other domain dataclasses.
Define a Codespace dataclass containing required fields (see discussion in opensafely-core/codespaces-initiative#42). Rather than use an instance of the existing Repo dataclass to store repo data, we only need the name and we only receive a minimal amount of repo data from the API so just store the name as a string. This is hopefully less confusing than modifying the Repo class or populating the extra fields this class requires with dummy data. An additional PAT is required to query codespaces for the opensafely GitHub organisation. Any future querying of codespaces for other organisations will require similarly permissioned PATs. The organisation codespaces endpoint is queried and returned data is passed unmodified to the Codespace dataclass's from_dict() method, which does the required data conversion. This follows the pattern established for the other domain dataclasses.
|
Based on the discovery in #8 .
We want to know how many Codespaces there are in the OpenSAFELY GitHub organisation.
This ticket is to:
If we can easily record additional information about the Codespaces, like Owner/Repo/State, that might be worth considering too.
The API call needed is something like:
Some slack discussion in this 🧵 thread.
The text was updated successfully, but these errors were encountered: