You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Bristol-Cambridge-Oxford meeting on May 2, 2024 @venexia asked a question about keeping Git branches in sync with job-server workspaces. The question was prompted by an exchange with tech support.1
We should recognize that it may not be desirable to keep Git branches in sync with job-server workspaces. Nevertheless, this issue captures the question and the exchange with tech support. Ultimately, the intention is to improve the documentation, by making recommendations to researchers.
In the following workflow, the primary branch is GitHub's default branch. It is often called main. The terms primary branch, primary workspace, secondary branch, and secondary workspace have no meaning beyond this issue. HEAD is Git-speak for "the current branch's latest commit".
Researcher creates repo from opensafely/research-template
Researcher commits to primary branch
Researcher creates primary workspace associated with primary branch
Researcher runs jobs in primary workspace
Primary workspace directories are created on L3 and L4 filesystems
Files are written to primary workspace directories
Researcher writes paper based on primary branch HEAD and files in primary workspace directories
Researcher submits paper 🎉
At this point, the paper is based on primary branch HEAD and files in primary workspace directories.
The paper is reviewed; further analysis is requested, which necessitates modifications to the dataset definition. The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset. 🙁
Researcher branches from primary branch, giving secondary branch
Researcher commits to secondary branch
Researcher creates secondary workspace associated with secondary branch
Researcher runs jobs in secondary workspace
Secondary workspace directories are created on L3 and L4 filesystems
Files are written to secondary workspace directories
Researcher updates paper based on secondary branch HEAD and files in secondary workspace directories
Researcher merges secondary branch into primary branch
Secondary branch is deleted (by researcher, by GitHub, etc.)
Researcher submits paper 🎉
At this point, the paper is based on primary branch HEAD and files in secondary workspace directories.
The paper is reviewed; further analysis is requested 🙁
Should the researcher commit to primary branch? Files in primary workspace directories are behind files in secondary workspace directories. The researcher would need to run jobs in primary workspace.
Should the researcher branch from primary branch, giving new secondary branch with same name as old secondary branch, and commit to new secondary branch? The researcher would not need to run jobs in secondary workspace.
The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset.
I think the user wants to undertake an experiment: that is, to compare the dataset in the primary workspace with the dataset in the secondary workspace. The comparison need not be exact; it may be an approximation. DVC (Data Version Control) provides experiment management, which we could learn from. For more information, see:
In the Bristol-Cambridge-Oxford meeting on May 2, 2024 @venexia asked a question about keeping Git branches in sync with job-server workspaces. The question was prompted by an exchange with tech support.1
We should recognize that it may not be desirable to keep Git branches in sync with job-server workspaces. Nevertheless, this issue captures the question and the exchange with tech support. Ultimately, the intention is to improve the documentation, by making recommendations to researchers.
In the following workflow, the primary branch is GitHub's default branch. It is often called
main
. The terms primary branch, primary workspace, secondary branch, and secondary workspace have no meaning beyond this issue.HEAD
is Git-speak for "the current branch's latest commit".HEAD
and files in primary workspace directoriesAt this point, the paper is based on primary branch
HEAD
and files in primary workspace directories.The paper is reviewed; further analysis is requested, which necessitates modifications to the dataset definition. The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset. 🙁
HEAD
and files in secondary workspace directoriesAt this point, the paper is based on primary branch
HEAD
and files in secondary workspace directories.The paper is reviewed; further analysis is requested 🙁
Should the researcher commit to primary branch? Files in primary workspace directories are behind files in secondary workspace directories. The researcher would need to run jobs in primary workspace.
Should the researcher branch from primary branch, giving new secondary branch with same name as old secondary branch, and commit to new secondary branch? The researcher would not need to run jobs in secondary workspace.
Footnotes
https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1709140748180189 ↩
The text was updated successfully, but these errors were encountered: