-
Notifications
You must be signed in to change notification settings - Fork 72
/
cm102.Rmd
392 lines (266 loc) · 17.1 KB
/
cm102.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
# (2) Git workflows
```{r, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
```{r,}
library(tidyverse)
install.packages('tictoc') # you might need to install this new package: install.packages('tictoc')
library(tictoc)
library(datasets)
library(glue)
```
## Today's Agenda
- Announcements:
- Reminder about Assignment 1 and milestone 1 due on Saturday at 6 PM
- By now you should have already gotten in touch with your partner. If you are having troubles, please send me a private message on canvas
- If you would like someone from the teaching team to approve your dataset, please create an issue [here](https://github.com/STAT547-UBC-2019-20/Discussions/issues)
- Part 1: Git and GitHub for collaborative work (20 mins)
- Sensible workflow for local work
- Setting up your project repositories
- Dealing with merge conflicts
- Part 2: More about fork and clone workflow
## Part 1: Git and GitHub for collaborative work
- Git class demo with Firas & Yulia
- Class activity with your partner
- Try to sit with your project partner
- If you are auditing the course, try to find another who is so you can do this part with someone else
- Tips and tricks on how to handle merge conflicts.
### Create your project repo and add your partner as admin
1. Go to the [STAT547 - Spring 2020 organization](https://github.com/STAT547-UBC-2019-20/)
2. On the right of the page, click the green "New" button
3. For the repository name, set it to "group_xx_yy" where xx is your group number (from canvas) and yy is an optional name of your repo
4. Keep this repo private (for now)
5. Initialize the repo with a README, add a .gitignore file (for R) and a license (MIT is fine).
6. Once at the home page of your repo, click "Settings", then "Manage Access"
7. Add your team member, and give them Admin access.
7. THERE IS NO NEED TO add any of the teaching team to your repo, we will have access as long as it's created inside the STAT547 organization.
### Setup your local directories, and RStudio to work nicely with GitHub
1. Locally on my machine, create a new folder called stat547 in a sensible place: <SENSIBLE_PATH>
2. Create folders for each of your main areas: `Discussion`, `assignments`, `project`, `participation`
3. On GitHub.com visit each of these repositories and copy the git repo URL into these spots
- `Discussion`: https://github.com/STAT547-UBC-2019-20/Discussions.git
- `assignment_01`: ## LINK TO THIS REPO
- `participation`: ## LINK TO THIS REPO
4. Create a new RStudio project for each of the repos above
- Open RStudio then click File >> New Project >> Version Control >> Git
- In the "Repository URL" field, paste the URL for one of the repos above
- In the "Project directory name" field, you should leave this as the name of the repo to avoid confusion
- In the "Create project as subdirectory of:" field, use the <SENSIBLE_PATH> from Step 1.
- Click "Create Project"
- You should repeat this step (Step 4) for every assignment that you accept in STAT547.
5. Now you have correctly set up RStudio with Git so you can commit, pull, push, etc... right from RStudio!
6. At the beginning of every lecture, open the "Discussion" RStudio project. Then click on the "Git" tab, and then pull the latest `cm1xx_participation.Rmd` file directly to your computer. Copy it to your participation repo, commit it, and then push.
### Activity 1: Fork and clone workflow
For your team project, you should be working in a "fork and clone" workflow (unless you consider yourself a git master).
More details about this workflow are described [here](https://happygitwithr.com/fork-and-clone.html) in happygitwithr.com
Here is what you need to do:
1. Go to your project repo, called `group_xx_yy`
2. Click the "fork" button on the repo home page
3. Make sure you fork the repo within the STAT547 organization
4. Copy the URL of YOUR fork of the main project repo
- Git URL of YOUR fork: ## LINK TO THIS REPO
5. Also grab the URL of the MAIN PROJECT REPO
- Git URL of MAIN PROJECT REPO: ## LINK TO THIS REPO
6. Create a new RStudio project and repeat Step 4 from the previous section for YOUR FORK of the main project repo
7. Now, we need to "link" the main project repoto YOUR FORK so you can send your changes back to the main project repo, and also receive any changes your partner made. To do this:
- In RStudio, find the "Terminal" button and click on it
- Now you're in a Terminal/Console (yes, *within* RStudio, weird - I know)
- Make sure you're in the correct directory (YOUR FORK cloned locally) by typing in `pwd`
- Type this command to link the MAIN PROJECT repo to your fork:
- `git remote add upstream <MAIN_PROJECT_REPO_URL>`
- Done!
8. To check if it worked, type this in the RStudio console:
- `git fetch upstream`
- If there are changes it will let you know what they are
- Once you're ready to pull the changes into YOUR FORK of the MAIN PROJECT repo, use this command:
- `git merge upstream/master`
- This will merge all the changes from the upstream URL (<MAIN_PROJECT_REPO_URL>) into your fork
- Do this often!
9. It is best practice to create a new branch in a forked repo so your work isn't committed to master:
- `git checkout -b name_of_your_new_branch`
### Activity 2: Send a PR from YOUR FORK back to the MAIN PROJECT
1. Will be demo'd
### Activity 3: "Catching up"
1. Accept a PR from your partner in the MAIN PROJECT repo
2. Merge those changes into YOUR FORK using `git merge upstream/master`
- Deal with any merge conflicts that arise
3. Send another PR now with even more changes, hopefully if everything was done correctly, the new changes should allow us to easily merge the PR in with the main code.
### (Optional) Activity 4: Class wide
You can do this all on GitHub.com - no need to clone locally.
1. Fork this [git demos](https://github.com/STAT547-UBC-2019-20/git_demos) repo
2. Create a branch called 'my_change'
3. Make a change in the `introductions.md` file
4. Create a PR and send it to the main `git_demos` repo
5. Let's see if we can get all 30 PRs merged in!
### Final: Provide links for the various sections of the git activity:
- Link for your fork of the project repo: ## YOUR LINK HERE
- Link to the Pull request (PR) you sent in the main repo: ## YOUR LINK HERE
- Link to the commit after fixing the merge conflict in the main repo : ## YOUR LINK HERE
- Link to the commit after you pulled changes from the upstream branch
[Happy Git with R](https://happygitwithr.com) is the most relevant for new R users, it was written by Jenny Bryan and others.
For more information on some tips of configuring RStudio to work with git, see [this support article](https://support.rstudio.com/hc/en-us/articles/200532077?version=1.2.5019&mode=desktop).
## Part 2: Fork and clone workflow
This section of the lecture notes have been adapted from [Jenny Bryan's HappyGitWithR](https://happygitwithr.com/fork-and-clone.html)
For the team projects in STAT547, we will be using the "Fork and clone" method to work collaboratively together and send changes back to the main repository (that each partner forked) via Pull Requests.
In general, use "fork and clone" to get a copy of someone else's repo if there's any chance you will want to propose a change to the owner, i.e. send a "pull request".
There are several other workflows for working collaboratively, and another popular one is called branch and PR - but that is not what we'll be doing in this course.
### Initial workflow
Sign in to [GitHub](https://github.com) navigate to the repo of interest.
Think of this as `OWNER/PROJECT`, where `OWNER` is the user or organization who owns the repository named `PROJECT`.
In the upper right hand corner, click **Fork**.
For each GitHub account, you can only fork the repo once.
Subsequent clicking on the FORK button will tell you that you've already forked this repo.
This creates a copy of `PROJECT` in your GitHub account and takes you there in the browser.
Now we are looking at `YOU/PROJECT`.
One way to tell whether you are in your fork (`YOU/PROJECT`), or the original `OWNER/PROJECT` is to check the name of the repo on the repo home page.
See below for markers:
<img src="img/git_fork.png" width=900>
Here are the next steps:
1. Copy the "CLONE URL" of `YOU/PROJECT` (it should look something like https://github.com/YOU/PROJECT.git)
2. Open R Studio on your computer
3. Create a new "Version Control >> Git" project in the appropriate location
4. For the "repository URL" specify the CLONE URL from step 1
We're doing this:
![](img/fork-and-clone.png)
### Don't mess with `master` {#dont-touch-master}
If you make any commits in your local repository, you should work in a [new branch](https://happygitwithr.com/git-branches.html#git-branches), not `master`.
You shouldn't make (code) commits to `master` of a repo you have forked.
This will make your life much easier if you want to [pull upstream work](https://happygitwithr.com/upstream-changes.html) into your copy. The `OWNER` of `PROJECT` will also be happier to receive your pull request from a non-`master` branch.
### The original repo as a remote
Remember we are here:
![](img/fork-and-clone.png)
Here is the current situation in words:
* You have a fork `YOU/PROJECT`, which is a repo on GitHub.
* You have a local clone of your fork.
* Your fork `YOU/PROJECT` is the remote known as `origin` for your local repo.
* You are well positioned to make a pull request to `OWNER/PROJECT`.
But notice the lack of a direct connection between your local copy of this repo and the original `OWNER/PROJECT`.
This is a problem - how will you ever get changes made to the original project?
![](img/fork-no-upstream-sad.png)
As time goes on, the original repository `OWNER/PROJECT` will continue to evolve.
You probably want the ability to keep your copy up-to-date.
In Git lingo, you will need to get the "upstream changes".
The next section deals with that.
![](img/fork-triangle-happy.png)
## Part 3: Getting changes from "upstream"
This workflow is relevant if you have done fork and clone and now you need to pull subsequent changes from the original repo into your copy.
You should set this up right away after you fork and clone, even though you don't need it yet.
Vocabulary: `OWNER/PROJECT` refers to the original GitHub repo, owned by `OWNER`, who is not you.
`YOU/PROJECT` refers to your copy on GitHub, i.e. your fork.
### No, you can't do this via GitHub
You might hope that GitHub could automatically keep your fork `YOU/PROJECT` synced up with the original `OWNER/PROJECT`. Or that you could do this in the browser interface. Then you could pull those upstream changes into your local repo.
But you can't.
There are some tantalizing, janky ways to sort of do parts of this. But they have fatal flaws that make them unsustainable. I believe you really do need to add `OWNER/PROJECT` as a second remote on your repo and pull from there.
### Initial conditions
Get into the repo of interest, i.e. your local copy. For many of you, this means launching it as an RStudio Project.
You'll probably also want to open a terminal within RStudio for some Git work via *Tools > Terminal > New Terminal*.
Make sure you are on the `master` branch and your "working tree is clean". `git status` should show something like:
``` bash
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
```
BTW I recommend that you never make your own commits to the `master` branch of a fork.
However, if you have already done so, your situation is addressed [here in section 29.8](https://happygitwithr.com/upstream-changes.html#touched-master).
### List your remotes
Let's inspect [the current remotes](#git-remotes) for your local repo. In the shell (Appendix \@ref(shell)):
``` bash
git remote -v
```
Most of you will see output along these lines (let's call this BEFORE):
``` bash
origin https://github.com/YOU/PROJECT.git (fetch)
origin https://github.com/YOU/PROJECT.git (push)
```
There is only one remote, named `origin`, corresponding to your fork on GitHub. This figure depicts a BEFORE scenario:
![](img/fork-no-upstream-sad.png)
This is sad, because there is no direct connection between `OWNER/PROJECT` and your local copy of the repo.
The state we want to see is like this (let's call this AFTER):
``` bash
origin https://github.com/YOU/PROJECT.git (fetch)
origin https://github.com/YOU/PROJET.git (push)
upstream https://github.com/OWNER/PROJECT.git (fetch)
upstream https://github.com/OWNER/PROJECT.git (push)
```
Notice the second remote, named `upstream`, corresponding to the original repo on GitHub. This figure depicts AFTER, the scenario we want to achieve:
![](img/fork-triangle-happy.png)
### Add the `upstream` remote
Let us add `OWNER/PROJECT` as the `upstream` remote.
On [GitHub](https://github.com), make sure you are signed in and navigate to the original repo, `OWNER/PROJECT`.
It is easy to get to from your fork, `YOU/PROJECT`, via "forked from" links near the top.
Use the big green "Clone or download" button to get the URL for `OWNER/PROJECT` on your clipboard.
Be intentional about whether you copy the HTTPS or SSH URL.
#### Command line Git
Use a command like this, but make an intentional choice about using an HTTPS vs SSH URL.
``` bash
git remote add upstream https://github.com/OWNER/REPO.git
```
The nickname `upstream` can technically be whatever you want but there is a strong convention/tradition of using `upstream`
#### RStudio
This feels a bit odd, but humor me. Click on "New Branch" in the Git pane.
![](img/rstudio-new-branch.png)]
This will reveal a button to "Add Remote". Click it. Enter `upstream` as the remote name and paste the URL for `OWNER/PROJECT` that you got from GitHub. Click "Add". Decline the opportunity to add a new branch by clicking "Cancel".
### Verify your `upstream` remote
Let's inspect the [current remotes](https://happygitwithr.com/git-remotes.html#git-remotes) for your local repo AGAIN. In the shell:
``` bash
git remote -v
```
Now you should see something like
``` bash
origin https://github.com/YOU/PROJECT.git (fetch)
origin https://github.com/YOU/PROJECT.git (push)
upstream https://github.com/OWNER/REPO.git (fetch)
upstream https://github.com/OWNER/REPO.git (push)
```
Notice the second remote, named `upstream`, corresponding to the original repo on GitHub.
We have gotten to this:
![](img/fork-triangle-happy.png)
### Pull changes from `upstream`
Before we pull changes down, let's first do a quick peek to see what WOULD happen if we pulled upstream changes.
``` bash
git fetch upstream
```
Okay, now that we're satisfied there are some changes, let's pull the changes that we don't have from `OWNER/PROJECT` into our local copy.
``` bash
git pull upstream master
```
This says: "pull the changes from the remote known as `upstream` into the `master` branch of my local repo".
We are being explicit about the remote and the branch in this case, because (as our `git remote -v` commands reveal), `upstream/master` is **not** the default tracking branch for local `master`.
See [Um, what if I did touch `master`?](https://happygitwithr.com/upstream-changes.html#touched-master) to get yourself back on the happy path.
### Push these changes to `origin/master`
Feel free to push the newly updated state of local `master` to your fork `YOU/PROJECT` and enjoy the satisfaction of being "caught up" with `OWNER/PROJECT`.
In the shell:
``` bash
git push
```
Or use the green "Push" button in RStudio.
### Um, what if I did touch `master`? {#touched-master}
I told you not to!
But OK here we are.
Let's imagine this is the state of the original repo `OWNER/PROJECT`:
``` bash
... -- A -- B -- C -- D -- E -- F
```
and and this is the state of the `master` branch in your local copy:
``` bash
... -- A -- B -- C -- X -- Y -- Z
```
The two histories agree, up to commit or state `C`, then they diverge.
If you want to preserve the work in commits `X`, `Y`, and `Z`, create a new branch right now, with tip at `Z`, via `git checkout -b my-great-innovations` (pick your own branch name!). Then checkout `master` via `git checkout master`.
I now assume you have either preserved the work in `X`, `Y`, and `Z` (with a branch) or have decided to let it go.
Do a hard reset of the `master` branch to `C`.
``` bash
git reset --hard C
```
You will have to figure out how to convey `C` in Git-speak. Specify it relative to `HEAD` or provide the SHA. See *future link about resets* for more support.
The instructions above for pulling changes from upstream should now work. Your `master` branch should reflect the history of `OWNER/PROJECT`:
``` bash
... -- A -- B -- C -- D -- E -- F
```
If you chose to create a branch with your work, you will also have that locally:
``` bash
... -- A -- B -- C -- D -- E -- F (master)
\
-- X -- Y -- Z (my-great-innovations)
```
If you pushed your alternative history (with commits `X`, `Y`, and `Z`) to your fork `YOU/PROJECT` and you like keeping everything synced up, you will also need to force push `master` via `git push --force`, but we really really don't like discussing force pushes in Happy Git. We only do so here, because we are talking about a fork, which is fairly easy to replace if things so sideways.