-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path01.3-githubandversioncontrol.Rmd
149 lines (100 loc) · 18.7 KB
/
01.3-githubandversioncontrol.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
## Github & Version Control
<div class = rmdnote> _[Launch a simple notebook](https://github.com/o-date/notebooks/)._ </div>
It's a familiar situation - you've been working on a paper. It's where you want it to be, and you're certain you're done. You save it as 'final.doc'. Then, you ask your friend to take a look at it. She spots several typos and that you flubbed an entire paragraph. You open it up, make the changes, and save as 'final-w-changes.doc'. Later that day it occurs to you that you don't like those changes, and you go back to the original 'final.doc', make some changes, and just overwrite the previous version. Soon, you have a folder like:
```fortran
|-project
|-'finalfinal.doc'
|-'final-w-changes.doc'
|-'final-w-changes2.doc'
|-'isthisone-changes.doc'
|-'this.doc'
```
Things can get messy quite quickly. Imagine that you also have several spreadsheets in there as well, images, snippets of code... we don't want this. What we want is a way of managing the evolution of your files. We do this with a program called [Git](https://git-scm.com/). Git is not a user-friendly piece of software, and it takes some work to get your head around. Git is also very powerful, but fortunately, the basic uses to which most of us put it to are more or less straightforward. There are many other programs that make use of Git for version control; these programs weld a graphical user interface on top of the main Git program. It is better however to become familiar with the basic uses of git from the command line *first* before learning the idiosyncrasies of these helper programs. The exercises in this section will take you through the basics of using Git from the command line.
### The core functions of Git
At its heart, Git is a way of taking 'snapshots' of the current state of a folder, and saving those snapshots or versions in sequence. For an excellent brief presentation on Git, see Alice Bartlett's [presentation here](https://speakerdeck.com/alicebartlett/git-for-humans). In Git's lingo, a folder on your computer is known as a `repository`. This sequence of versions in total lets you see how your project unfolded over time. Each time you wish to take a version, you make a `commit`. A commit is a Git command to take a snapshot of the entire repository. Thus, your folder we discussed above, with its proliferation of documents becomes:
```fortran
|-project
|-'final.doc'
```
BUT its commit history could be visualized like this:
![A visualization of the history of commits](images/commit-history.png)
Each one of those circles represents a point in time when you the writer made a commit; Git compared the state of the file to the earlier state, and saved a snapshot of the `differences`. What is particularly useful about making a commit is that Git requires two more pieces of information about the git: who is making it, and when. The final useful bit about a commit is that you can save a detailed message about *why* the commit is being made. In our hypothetical situation, your first commit message might look like this:
```bash
Fixed conclusion
Julie pointed out that I had missed
the critical bit in the assignment
regarding stratigraphy. This was
added in the concluding section.
```
This information is stored in the history of the commits. In this way, you can see exactly how the project evolved and why. Each one of these commits has what is called a `hash`. This is a unique fingerprint that you can use to 'time travel' (to borrow Bartlett's phrasing). If you want to see what your project looked like a few months ago, you `checkout` that commit. This has the effect of 'rewinding' the project. Once you've checked out a commit, don't be alarmed when you look at the folder: your folder (your repository) looks like how it once did all those weeks ago! Any files written after that commit seem as if they've disappeared. Don't worry: they still exist!
What would happen if you wanted to experiment or take your project in a new direction from that point forward? Git lets you do this. What you will do is create a new `branch` of your project from that point. You can think of a branch as like the branch of a tree, or perhaps better, a branch of a river that eventually merges back to the source. Another way of thinking about a branch is that it is a label that sticks with these particular commits. It is generally considered 'best practice' to leave your `master` branch alone, in the sense that it represents the best version of your project. When you want to experiment or do something new, you create a `branch` and work there. If the work on the branch ultimately proves fruitless, you can discard it. *But*, if you decide that you like how it's going, you can `merge` that branch back into your master. A merge is a commit that folds all of the commits from the branch with the commits from the master.
Git is also a powerful tool for backing up your work. You can work quite happily with Git on your own machine, but when you store those files and the history of commits somewhere remote, you open up the possibility of collaboration *and* a safe place where your materials can be recalled if -perish the thought- something happened to your computer. In Git-speak, the remote location is, well, the `remote`. There are many different places on the web that can function as a remote for Git repositories. You can even set one up on your own server, if you want. One of the most popular (and the one that we use for ODATE) is [Github](http://github.com). There are many useful repositories shared via Github of interest to archaeologists - [OpenContext](http://opencontext.org) for instance shares a lot of material that way. To get material *out* of Github and onto your own computer, you `clone` it. If that hypothetical paper you were writing was part of a group project, your partners could clone it from your Github space, and work on it as well!
You and Anna are working together on the project. You have made a new project repository in your Github space, and you have cloned it to your computer. Anna has cloned it to hers. Let's assume that you have a very productive weekend and you make some real headway on the project. You `commit` your changes, and then `push` them from your computer to the Github version of your repository. That repository is now one commit *ahead* of Anna's version. Anna `pulls` those changes from Github to her own version of the repository, which now looks *exactly* like your version. What happens if you make changes to the exact same part of the exact same file? This is called a `conflict`. Git will make a version of the file that contains text clearly marking off the part of the file where the conflict occurs, with the conflicting information marked out as well. The way to `resolve` the conflict is to open the file (typically with a text editor) and to delete the added Git text, making a decision on which information is the correct information.
### Key Terms
+ repository: a single folder that holds all of the files and subfolders of your project
+ commit: this means, 'take a snapshot of the current state of my repository'
+ publish: take my folder on my computer, and copy it and its contents to the web as a repository at github.com/myusername/repositoryname
+ sync: update the web repository with the latest commit from my local folder
+ branch: make a copy of my repository with a 'working name'
+ merge: fold the changes I have made on a branch into another branch
+ fork: to make a copy of someone else's repo
+ clone: to copy an online repo onto your own computer
+ pull request: to ask the original maker of a repo to 'pull' your changes into their master, original, repository
+ push: to move your changes from your computer to the online repo
+ conflict: when two commits describe different changes to the same part of a file
### Take-aways
+ Git keeps track of all of the differences in your files, when you take a 'snapshot' of the state of your folder (repository) with the `commit` command
+ Git allows you to roll back changes
+ Git allows you to experiment by making changes that can be deleted or incorporated as desired
+ Git allows you to manage collaboration safely
+ Git allows you to distribute your materials
### Further Reading
We alluded above to the presence of 'helper' programs that are designed to make it easier to use Git to its full potential. An excellent introduction to Github's desktop GUI is at this [Programming Historian lesson on Github](http://programminghistorian.org/lessons/getting-started-with-github-desktop). A follow-up lesson explains the way Github itself can be used to host entire websites! You may explore it [here](http://programminghistorian.org/lessons/building-static-sites-with-jekyll-github-pages). In the section of this chapter on open notebooks, we will also use Git and Github to create a simple open notebook for your research projects.
You might also wish to dip into the [archived live stream; link here](https://www.youtube.com/watch?v=D0_j04BnVeA) from the first day of the NEH funded Institute on Digital Archaeology Method and Practice (2015) where Prof. Ethan Watrall discusses project management fundamentals and, towards the last part of the stream, introduces Git.
### Exercises
Take a copy of the [ODATE Binder](https://github.com/o-date/notebooks/). Carefully peruse the [readme](https://github.com/o-date/notebooks/blob/master/README.md) so that you can create your own version of the _executable_ version. Once you've done that, launch the binder. It might take five or six minutes to launch; be patient.
Once it has launched, click the `new` dropdown and select terminal.
1. Because the Jupyter notebook is being served from a github repository, the `git` program is already running in the background and is keeping track of changes. If you were working on your own machine (and had git installed), you could turn *any* folder into a repository by telling Git to start watching the folder, by typing `git init` at the command prompt inside that folder.
Periodically, we tell Git to take a snapshot of the state of the files in the folder by using the command 'git commit'. This sequence of changes to your repo are stored in a *hidden* directory, `.git`. Most of the time, you will never have reason to search that folder out. (But know that the config file that describes your repo is in that folder. There might come a time in the future where you want to alter some of the default behaviour of the git program. You do that by opening the config file, which you can read with a text editor. Google 'show hidden files and folders' for your operating system when that time comes.)
2. Let's make a new file; we can do this by selecting the kind of file we want from the `new` dropdown menu. Select 'text file'. Jupyter will open its text editor and create a new file for you called `untitled.txt`. Click on the name to change it to `exercise1.md`. Type in the editor to add some information in it - perhaps a note about what you're trying to do- then hit save. **Do not hit logout**.
a. From the Jupyter home screen, hit `new` and select terminal. At the prompt type `$ git status`
b. Git will respond with a couple of pieces of information. It will tell you which `branch` you are on. It will list any untracked files present or new changes that are unstaged. We now will `stage` those changes to be added to our commit history by typing `$ git add -A`. (the bit that says `-A` adds any new, modified, or deleted files to your commit when you make it. There are [other options or flags](https://stackoverflow.com/questions/572549/difference-between-git-add-a-and-git-add#572660) where you add *only* the new and modified files, *or* only the modified and deleted files.)
c. Let's check our git status again: type `$ git status`
d. You should see something like this:
```bash
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: exercise1.md```
```
e. Let's take a snapshot: type `$ git commit -m "My first commit"`. Nothing much seems to have happened; a new `$` is displayed. In this kind of environment, no news is good news. It's only often when something goes wrong that you'll see the terminal print out information. For some users (especially if you are approaching these exercises via DHBox rather than our Binder environment) there could be an error at this point. Git keeps track not only of the changes, but *who* is making them. Git might ask you for your name and email. Helpfully, the Git error message tells you exactly what to do: type `$ git config --global user.email "you\@example.com"` and then type `$ git config --global user.name "Your Name"`. Now try making your first commit.
Congratulations, you are now able to track your changes, and keep your materials under version control!
3. Go ahead and make some more changes to your repository. Add some new files. Commit your changes after each new file is created. Now we're going to view the history of your commits. Type `$ git log`. What do you notice about this list of changes? Look at the time stamps. You'll see that the entries are listed in reverse chronological order. Each entry has its own 'hash' or unique ID, the person who made the commit and time are listed, as well as the commit message eg:
```bash
commit 253506bc23070753c123accbe7c495af0e8b5a43
Author: Shawn Graham <[email protected]>
Date: Tue Feb 14 18:42:31 2017 +0000
Fixed the headings that were broken in the about section of readme.md
```
a. We're going to go back in time and create a new branch. You can escape the `git log` by typing `q`. Here's how the command will look: `$ git checkout -b branchname <commit>` where `branch` is the name you want the branch to be called, and `<commit>` is that unique ID. Make a new branch from your second last commit (don't use < or >).
b. We typed `git checkout -b experiment 253506bc23070753c123accbe7c495af0e8b5a43` (**Don't** copy _our_ command, because our version includes a reference to a commit that doesn't exist for you! Select a commit reference from your own commit history, which you can inspect with `git log`). The response: `Switched to a new branch 'experiment'` Check git status and then list the contents of your repository. What do you see? You should notice that some of the files you had created before seem to have disappeared - congratulations, you've time travelled! Those files are not missing; but they *are* on a different branch (the master branch) and you can't harm them now. Add a number of new files, making commits after each one. Check your git status, and check your git log as you go to make sure you're getting everything. Make sure there are no unstaged changes - everything's been committed.
4. Now let's assume that your `experiment` branch was successful - everything you did there you were happy with and you want to integrate all of those changes back into your `master` branch. We're going to merge things. To merge, we have to go back to the master branch: `$ git checkout master`. Good practice is to keep separate branches for all major experiments or directions you go. In case you lose track of the names of the branches you've created, this command: `git branch -va` will list them for you.
a. Now, we merge with `$ git merge experiment`. Remember, a merge is a special kind of commit that rolls all previous commits from both branches into one - Git will open your text editor and prompt you to add a message (it will have a default message already there if you want it). Save and exit and ta da! Your changes have been merged together.
5. One of the most powerful aspects of using Git is the possibility of using it to manage collaborations. To do this, we have to make a copy of your repository available to others as a `remote`. There are a variety of places on the web where this can be done; one of the most popular at the moment is [Github](http://github.com). Github allows a user to have an unlimited number of `public` repositories. Public repositories can be viewed and copied by anyone. `Private` repositories require a paid account, and access is controlled. If you are working on sensitive materials that can only be shared amongst the collaborators on a project, you should invest in an upgraded account (note that you can also control which files get included in commit; see [this help file](https://help.github.com/articles/ignoring-files/). In essence, you simply list the file names you do not want committed; here's an [example](https://gist.github.com/octocat/9257657)). Let's assume that your materials are not sensitive.
a. Now we push your work in the repository onto the version that lives at Github.com:
```bash
git push -u origin master
```
*NB* If you wanted to push a `branch` to your repository on the web instead, do you see how you would do that? If your branch was called `experiment`, the command would look like this:
```bash
$ git push origin experiment
```
b. Because your repository is on the web at Github, it is possible that you might make changes to the repository directly there, or from your local machine. To get the updates to integrate into where you are currently working, you could type
```bash
$ git pull origin master
```
and then begin working.
### Warnings
This only scratches the surface of what Git and Github can do. Remember that git is the program that keeps snapshots of your work and enables version control. Github is a place that lets you share that sequence of snapshots and files so that others can contribute changes. More information about Git and some exercises conceived for the DHBox/Linux/Mac are available [here](http://workbook.craftingdigitalhistory.ca/module-1/Exercises/#exercise-4-a-detailed-look-at-using-git-on-the-command-line). Although it is possible to make changes to files directly via the edit button on Github, you should be careful if you do this, because things rapidly can become out of sync, resulting in conflicts between differing versions of the same file. _Get in the habit of making your changes on your own machine, and making sure things are committed and up-to-date (`git status`, `git pull origin master`, `git fetch upstream` are your friends) before beginning work_. At this point, you might want to investigate some of the graphical interfaces for Git (such as [Github Desktop](https://desktop.github.com/)). Knowing as you do how things work from the command line, the idiosyncrasies of the graphical interfaces will make more sense. For further practice on the ins-and-outs of Git and Github Desktop, we recommend trying [the Git-it app](https://github.com/jlord/git-it-electron/) by Jessica Lord.
For help in resolving merge conflicts, see the [Github help documentation](https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/). For a quick reminder of how the workflow should go, see [this cheat-sheet by Chase Pettit](https://gist.github.com/shawngraham/513d4b2860d52fdac6bd783e4387957e).