From 0b6232a8248db20fb54274f04f7080a9ee083e7b Mon Sep 17 00:00:00 2001 From: Seth Erickson Date: Mon, 25 Nov 2024 11:06:25 -0800 Subject: [PATCH 1/3] use 'code' as uncountable, not countable noun --- coding.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/coding.qmd b/coding.qmd index e21617e..7458bc1 100644 --- a/coding.qmd +++ b/coding.qmd @@ -2,7 +2,7 @@ title: "Code as a scientific product" --- -Codes are part of the scientific products your Lab is producing. It is thus important to manage, document, and preserve them as you would do for any other of your scientific products. In this session, we will discuss tools and practices that can ease the management of your codes, as well as develop them in a collaborative way. +Code is very likely one of the scientific products your Lab is producing. It is thus important to manage, document, and preserve source code as you would any other scientific products. In this session, we will discuss tools and practices that can ease the management of your code and facilitate collaborative software development. ## Code repository From c2a510f7e69396dc341d92913e23543db185aa33 Mon Sep 17 00:00:00 2001 From: Seth Erickson Date: Mon, 25 Nov 2024 22:24:53 +0000 Subject: [PATCH 2/3] page edits: Version Control with Git and GitHub --- _quarto.yml | 2 +- github_intro.qmd | 72 +++++++++++++++++++----------------------------- 2 files changed, 29 insertions(+), 45 deletions(-) diff --git a/_quarto.yml b/_quarto.yml index 85924e1..3bf2d44 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -25,7 +25,7 @@ website: - section: "Coding together" contents: - href: github_intro.qmd - text: "Our tools: git & GitHub" + text: "Version Control with Git & GitHub" - href: 01-handson_github_website.qmd text: "Hands-on: Using Github's website" - href: git_rstudio.qmd diff --git a/github_intro.qmd b/github_intro.qmd index d8a3122..ac2db1b 100644 --- a/github_intro.qmd +++ b/github_intro.qmd @@ -1,50 +1,39 @@ --- -title: "git and GitHub" +title: "Version Control with Git and GitHub" --- -# Version Control with `git` and `GitHub` - -Aka -- **Say goodbye to `script_JB_03v5b.R` !!** - - -## The problem with `save_as` - -```{r phd_comics_final, out.width='80%', fig.align="center",echo=FALSE} -knitr::include_graphics("img/phd_comics_final.png") -``` +## The problem with "Save As" Every file in the scientific process changes. Manuscripts are edited. Figures get revised. Code gets fixed when problems are discovered. Data files get combined together, then errors are fixed, and then they are split and combined again. In the course of a single analysis, one can expect thousands of changes to files. And yet, all we use to track this are simplistic *filenames*. You might think there is a better way, and you'd be right: __version control__. -Version control systems help you track all of the changes to your files, without the spaghetti mess that ensues from simple file renaming. In other words, version control is a system that helps you to manage the different versions of your files in an organized manner. It will help you to never have to duplicate files using `save as` as a way to keep different versions of a file (see below). Version control helps you to develop a timeline of snapshots containing the different versions of a file. At any point in time, you will be able to roll back to a specific version. Bonus: you can add a short description (commit message) to remember what each specific version is about. - -**What is the difference between `git` and `GitHub`?** - -- __git__: is a version control software used to track files in a folder (a repository) - - git creates a timeline or history of your files -- __GitHub__: is a code repository in the cloud that enables users to store their git repositories and share them with others. Github also adds many features to manage projects and document your work. +![](img/phd_comics_final.png){width=60% fig-align="center" fig-alt="A comic about managing revisions using filenames"} +Version control systems help you track all of the changes to your files, without the spaghetti mess that ensues from simple file renaming. In other words, version control is a system that helps you to manage the different versions of your files in an organized manner. It will help you to never have to duplicate files using "save as" as a way to keep different versions of a file (see below). Version control helps you to develop a timeline of snapshots containing the different versions of a file. At any point in time, you will be able to roll back to a specific version. Bonus: you can add a short description (commit message) to remember what each specific version is about. -## git +**What is the difference between Git and GitHub?** - +- __Git__: is version control tool used to track versions of files in a folder or "repository" (to use git's terminology). + - It is open source software -- [the Git project uses Git](https://git.kernel.org/pub/scm/git/git.git)! + - It works best for tracking revisions to plain-text files. +- __GitHub__: is web-based platform for storing git repositories developing software collaboratively. + - It adds many features to manage projects and document your work. + - it is a commercial product from Microsoft. -This section focuses on the code versioning system called `Git`. Note that there are others, such as `Mercurial` or `svn` for example. +## Git -Git is a *free* and *open source* distributed *version control system*. It has many functionalities and was originally geared towards software development and production environment. Git was initially designed and developed in 2005 by Linux kernel developers (including Linus Torvalds) to track the development of the Linux kernel. Here is a [fun video](https://www.youtube.com/watch?v=4XpnKHJAok8) of Linus Torvalds touting Git to Google. +This section focuses on the code versioning tool, Git. There are others tools for source code management, such as Mercurial and Subversion, but Git is the most widely used. -**How does it work?** - -Git can be enabled on a specific folder/directory on your file system to version files within that directory (including sub-directories). In git (and other version control systems) terms, this “tracked folder” is called a **repository** (which formally is a specific data structure storing versioning information). +Git is a *free* and *open source* distributed *version control system*. It has many functionalities and was originally geared towards software development and production environment. Git was initially designed and developed to track the development of the Linux kernel. **What git is not:** -- **Git is not a backup per se** -- Git is not good at versioning large files (there are workarounds) => not meant for large data +- Git is **not a backup** per se. +- Git is not good at versioning large files. It works best with plain-text files, not large data sets. ::: {.callout-note collapse=true} ### Fun fact -Git was initially designed and developed by Linux kernel developers (including Linus Torvalds) to track the development of the Linux kernel in 2005. Here is a [fun video](https://www.youtube.com/watch?v=4XpnKHJAok8) of Linus Torvalds touting Git to Google engineers. +Git was initially designed and developed in 2005 to track the development of the Linux kernel. Here is a [fun video](https://www.youtube.com/watch?v=4XpnKHJAok8) of Linus Torvalds touting Git to Google engineers. ::: ### Repository @@ -53,26 +42,22 @@ Git can be enabled on a specific folder/directory on your file system to version Although there are many ways to start a new repository, [GitHub](https://github.com/) (or any other cloud solution, such as [GitLab](https://about.gitlab.com/)) provides among the most convenient way of starting a repository. - - - ## GitHub - -**GitHub is a company that hosts git repositories online** and provides several collaboration features (among which `forking`). GitHub fosters a great user community and has built a nice web interface to git, also adding great visualization/rendering capacities of your data. +**GitHub is a company that hosts git repositories online** and provides collaboration features like forking and pull requests. GitHub has a large user community and has built a nice web interface to git, also adding great visualization/rendering capacities of your data. ### GitHub Dashboard This is the default landing page when you log into your account. It provides a mix of the most recent resources and activities of your and your collaborators' actions, as well as some resources relevant to your work. The dashboard therefore changes on a regular basis. Once logged in, you can access your dashboard at -![](img/github_overview-01_dashboard.png) +![](img/github_overview-01_dashboard.png){width=80% fig-align="center" fig-alt="Screenshot of Github's Dashboard" .lightbox} ### GtiHub User page -This page can be reached using the following URL: https://github.com/`username`. For my user (`brunj7`) it would be: . It is a great space for you to provide some information about yourself and the main repositories you are working on. It also lists the GitHub Organizations you are part of. But more importantly, Users own repositories to host and share their code. You can list repositories from a User by clicking on the _repositories_ tab in the main GitHub menu bar at the top. +This page can be reached using the following URL: https://github.com/`username`. For my username (`brunj7`) it would be: . It is a great space for you to provide some information about yourself and the main repositories you are working on. It also lists the GitHub Organizations you are part of. But more importantly, Users own repositories to host and share their code. You can list repositories from a User by clicking on the _repositories_ tab in the main GitHub menu bar at the top. -![](img/github_overview-02_user.png) +![](img/github_overview-02_user.png){fig-align="center" width=80% fig-alt="Screenshot of Github's user page" .lightbox} ### GitHub Organization page @@ -80,7 +65,7 @@ This page can be reached using the following URL: https://github.com/`username`. We will talk more about GitHub Organizations later. In a nutshell, organizations are like groups or teams that users can be members of. Like Users, Organizations can have a landing page and own repositories. However, they add several perks in terms of user management. Similarly to Users, you can access repositories from an Organization by clicking on the _repositories_ tab in the main GitHub menu bar at the top. You can access an organization's page similarly to a user: https://github.com/`organization-name`; e.g. -![](img/github_overview-03_organization.png){fig-align="center" width=90%} +![](img/github_overview-03_organization.png){fig-align="center" width=80% fig-alt="Screenshot of GitHub's Organization page" .lightbox} ### Let's look at a repository on GitHub @@ -93,27 +78,26 @@ The screenshot below shows the landing page of a repository on GitHub. We would - In the middle, the last commit message on this file (or file contained in a folder) (purple) - On the right, the time stamps of the latest commit (green) -![](img/github-repo.png){fig-align="center" width=80%} +![](img/github-repo.png){fig-align="center" width=80% fig-alt="Screenshot of a GitHub repository landinglanding page" .lightbox} -Below the file listing, there will be a rendering of the README.md file, one more reason to make sure to add one :) +Below the file listing, there will be a rendering of the README.md file, one more reason to make sure to add one 🙂. Looking into more details at the information provided about the last commit, we can see that we know: -- Which user did this last commit (brunj7) and the associated commit message +- Which user made the last commit (brunj7) and the associated commit message - The the 7 first digit of unique identifier ([SHA](https://en.wikipedia.org/wiki/Secure_Hash_Algorithms)) of this commit - When this last commit was made (3 months ago) = The total number of commits on this branch (43 Commits) -![](img/github-repo_last-comit.png){fig-align="center" width=80%} +![](img/github-repo_last-comit.png){fig-align="center" width=80% fig-alt="Screenshot of the last commit message for a repo on GitHub" .lightbox} This total number of commits is a hyperlink that lets you access the full history of the main branch by clicking on it. -![](img/github-repo_history.png){fig-align="center" width=80%} +![](img/github-repo_history.png){fig-align="center" width=80% fig-alt="Screenshot of a repo's commit history on GitHub" .lightbox} We can keep drilling and look at a specific commit by clicking on the hash number listed on the right. For example, we can look at the first commit at the top (d2b75a5) and display the exact changes that have been made since the previous (also named `parent`) commit `20bc390`: - -![](img/github-repo_commit-view.png){fig-align="center" width=80%} +![](img/github-repo_commit-view.png){fig-align="center" width=80% fig-alt="Screenshot of a commit's details on GitHub's" .lightbox} Tracking these changes, and seeing how they relate to scripts and files is exactly what Git and GitHub are good for. We will show how they can be effective for tracking versions of scientific code, figures, and other text files such as manuscripts to develop a reproducible workflow. From ca3ea99a6bdc86bdd8c3bd63143989554fcf8312 Mon Sep 17 00:00:00 2001 From: Seth Erickson Date: Mon, 25 Nov 2024 22:31:47 +0000 Subject: [PATCH 3/3] capitalize 'Git' --- coding.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/coding.qmd b/coding.qmd index 7458bc1..8d4e0e3 100644 --- a/coding.qmd +++ b/coding.qmd @@ -10,7 +10,7 @@ As data repositories, code repositories are a great way to preserve and share yo In this session, we will be focusing on using GitHub to collaborate and keep track of the development of our scripts. -- Quick intro to git & GitHub +- Quick intro to Git & GitHub - Why those are tools you want to use - GitHub Website interface walk-through - Using RStudio