diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..e43b0f98 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..f1b34f02 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2022 Rachael + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 00000000..15324e8b --- /dev/null +++ b/README.md @@ -0,0 +1,110 @@ +# DSI Course for Bash, Git and GitHub + +## Contents: +1. [Description](https://github.com/rachaellam/dsi-workshop#description) +2. [Learning Outcomes](https://github.com/rachaellam/dsi-workshop#learning-outcomes) +3. [Logistics](https://github.com/rachaellam/dsi-workshop#logistics) +4. [Marking Scheme](https://github.com/rachaellam/dsi-workshop#marking-scheme) +5. [Policies](https://github.com/rachaellam/dsi-workshop#policies) +6. [Folder Structure](https://github.com/rachaellam/dsi-workshop#folder-structure) +7. [Acknowledgements and Contributions](https://github.com/rachaellam/dsi-workshop#acknowledgements-and-contributions) + +## Description: +The course was created by the University of Toronto's Data Science Institute. The beginning of the course will introduce the basic language of Unix shell including how to navigate and manipulate files and directories. Learners will then learn certain commands, how to create scripts and write basic functions using pipes, filters and loops. + +The next portion of the course will be dedicated to getting started with version control and GitHub, and how it connects to the ethical discussions of reproducibility. Learners will learn how to set up Git and initialize and utilize repositories, including recording, viewing and undoing changes. They will also learn how to create branches and collaborate with others with shared branches. This course will put it all together and introduce some more advanced commands such as de-bugging and history editing. + +Finally, learners will determine how to problem-solve by identifying where the issue is and how to search with Google and Stack Overflow. This will then lead to the topic of reproducibility and how to contribute by commenting code and writing documentation. + +This course is designed for those who have a degree in something other than Computer Science/Statistics who are looking to enhance their data science skills for their career. + +## Learning Outcomes +Students will know how to... +1. Access the terminal and write scripts using basic commands, variables, pipes, filters and loops. This will be assessed in Assignment 1. +2. Use version control to preserve personal work, access and edit pervious code versions, collaborate with others, and find and debug errors. This will be assessed in Assignment 2. +3. Solve problems independently by identifying issues, researching, or properly formulating questions using components of reproducibility. This will be assessed in both Assignment 1 and Assignment 2. +4. Synthesize all work within wider discussions of ethics and inequity. Students will actively scrutinize who is and isn't in our datasets and develop knowledge of past abuses of power to better engage their work with ethical considerations. This will be assessed in Assignment 2. + +## Logistics + +### Course Contacts +* Instructor: [**Name**] [Pronouns] [degree]. hyperlinked email + * Email etiquette + * Other comments +* TA: [**Name**] [pronouns] [degree]. hyperlinkedEmail + +### Delivery instructions +The workshop will be held over three weeks, three days a week. Two of the three days will be 2-hours long and the last day will be 3-hours. Being mindful of online fatigue, there will be one break during each class where students are encouraged to stretch, grab a drink and snacks, or ask any additional questions. + +### Technology Requirements +1. Camera is optional although highly encouraged. We understand that not everyone may have the space at home to have the camera on. + + +### Lesson Schedule +| Lesson | Topic | Assignments | Resources | +|--------|----------------------------------------------------------------------------------------------|------------------|------------| +| 1 | Unix Shell I
(introducing the Shell, introductory commands, files and directories) | [Assignment 1]() | [Slides]() | +| 2 | Unix Shell II
(input/output and pipes/filters) | [Assignment 1]() | [Slides]() | +| 3 | Unix Shell III
(shell scripts, shell functions, parameters, flow control) | [Assignment 1]() | [Slides]() | +| 4 | Version Control and GitHub I
(introducing version control and GitHub, basic Git commands) | [Assignment 2]() | [Slides]() | +| 5 | Version Control and GitHub II
(remote repositories; branching) | [Assignment 2]() | [Slides]() | +| 6 | Version Control and GitHub III
(collaborating, dealing with conflicts) | [Assignment 2]() | [Slides]() | +| 7 | Problem solve, reproducibility, ethics, inequity | [Assignment 1]()
[Assignment 2]() | [Slides]() | +| 8 | Professional Skills - Industry Case Study | [Assignment 2]() | [Slides]() | +| 9 | Data Science Foundations - Review and Practice | | [Slides]() | + +## Marking Scheme +| Assessment | Weight | Description | Due Date | +|------------------|--------|-------------|----------| +| [Assignment 1]() | | | | +| [Assignment 2]() | | | | +| | | | | + +## Policies +The course is a live-coding class. Students are expected to follow along with the coding, creating files and folders to navigate and manipulate. Students should be active participants while coding and are encouraged to ask questions throughout. Although slides will be available for students to reference, they should be referenced before or after class, as during class will be dedicated to coding with the instructor. + +**How to submit assignments, late policy, academic integrity.** + +## Folder Structure +Below are the folders contained in this repo with a description of what they contain and information on how to use them. + +### 1 *assignments*: +This folder contains the assignments for the workshop. Students are expected to complete them one week after the content has been delivered. + +### 2. *homework*: +This folder contains homework for students to practice Unix and Git/GitHub workshops. Please complete the Unix Shell homework in the first week, and the Git/GitHub homework in the second. + +There are pdf copies of the homework and markdown files, which can be edited. The homework can change based on the amount of content that was completed each day. + +Homework is just a suggestion but will help students throughout the workshop, as content is cumulative and will only get more difficult. Unfortunately, there is not enough time to review previous content each class so while this homework is **not** graded, it is highly recommended. + +### 3. *lessons*: +This folder contains the pdf and html version of the slides. Either the pdf slides or the html slides can be used when teaching. If slides are edited to contain any gifs, the instructor will need to use the html slides so that the gifs are active. + +pdf slides should be referenced before class to prepare or after class to review. During class will be live-coding, therefore, there is no need to follow them during class. They contain all information that was discussed in class and are a great resource in the future if students need to reassess their knowledge. + +### 4. *post-course*: +This folder contains the exit surveys for students to complete. It holds both the md and docx versions of the survey. + +### 5. *slides-resources*: +This folder contains all editable slides. To edit, download the entire folder, including the *pics* folder as this folder contains the pictures which are relationally referenced in the markdown files. + +To change a photo, edit the markdown where photos are referenced. + +Example: + +Change `![w:1150 center](pics/github.png)` to `![bg](pics/github.png)` + +To add a photo, add photo to the *pics* folder and reference it within the markdown file. + +Example: + +Added photo labelled "git_commit.png" will be referenced in markdown file as `![w:1000 left](pics/git_commit.png)` + +## Acknowledgements and Contributions +## Achnowledgements +* Who helped make theses slides +* We wish to acknowledge this land on which the University of Toronto operates. For thousands of years it has been the traditional land of the Huron-Wendat, the Seneca, and most recently, the Mississaugas of the Credit River. Today, this meeting place is still the home to many Indigenous people from across Turtle Island and we are grateful to have the opportunity to work on this land. +### Contributions +* `bash-git-github` welcomes issues, enhancement requests, and other contributions. To submit an issue, use the [GitHub +issues](https://github.com/anjalisilva/bash-git-github/issues). diff --git a/assignments/00-DSI-Pre-Workshop-Assignment.pdf b/assignments/00-DSI-Pre-Workshop-Assignment.pdf new file mode 100644 index 00000000..75d654fb Binary files /dev/null and b/assignments/00-DSI-Pre-Workshop-Assignment.pdf differ diff --git a/assignments/01-Unix-Assignment.pdf b/assignments/01-Unix-Assignment.pdf new file mode 100644 index 00000000..fbf02524 Binary files /dev/null and b/assignments/01-Unix-Assignment.pdf differ diff --git a/assignments/02-Git-Quiz.pdf b/assignments/02-Git-Quiz.pdf new file mode 100644 index 00000000..d49f5981 Binary files /dev/null and b/assignments/02-Git-Quiz.pdf differ diff --git a/assignments/03-Ethics-Inequality-Assignment.pdf b/assignments/03-Ethics-Inequality-Assignment.pdf new file mode 100644 index 00000000..2150ca81 Binary files /dev/null and b/assignments/03-Ethics-Inequality-Assignment.pdf differ diff --git a/homework/git-homework.md b/homework/git-homework.md new file mode 100644 index 00000000..9c93bff1 --- /dev/null +++ b/homework/git-homework.md @@ -0,0 +1,99 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +--- + + + +# **Git/GitHub Homework** +```console +$ echo "Data Sciences Institute" +$ echo "by: Rachael Lam" +``` + +--- +##### **Expectations** +The goal of this homework is not to grade the competancy of what was learned, but to give students an opportunity to practice. This will help students remember the content and prepare for the next class. + +Because each class builds upon the last, it's important to review the content, as time is too limited for a full in-class review. + +--- + + +## `Day 1` + +--- +##### **To Review:** +Please practice the following commands. You can either work with a new repo, or one that you are currently working on. +1. initialize a new repo `git init` + +2. clone a new or existing repo `git clone` +3. see status of files in repo `git status` +4. add files to be staged `git add` +5. see changes between files `git diff` +6. remove files `git rm` +7. move files `git mv` + +--- +- Remember to practice with different options. +- You can make test files and folders if you don't want to delete anything. +- Continuously practicing will drill these commands into your memory so it will become easier with time. + +--- + + +## `Day 2` + +--- +##### **To Review:** +Please practice the following commands. Attempt to move around your commits (forwards and backwards). +1. see history of commits `git log` + +2. changing a commit `--amend` +3. unstage a file `git reset` +4. revert to original file and to previous commit `git checkout` + +--- +5. add remote repo `git remote` + +6. fetch new changes from remote repo `git fetch` / `git pull` +7. push changes to remote repo `git push` +8. create new branch `git branch` +9. change working branch `git checkout` +10. merge branches `git merge` + +Also practicing pushing and pulling to remote branches. + +--- + + +## `Day 3` + +--- +##### **To Review** +There are several GitHub Skills courses that will help you practice collaborative processes. Please review the following: + +1. [Pull Requests](https://github.com/skills/review-pull-requests) + +2. [Merge Conflicts](https://github.com/skills/resolve-merge-conflicts) + +3. [Markdown](https://github.com/skills/communicate-using-markdown) diff --git a/homework/git-homework.pdf b/homework/git-homework.pdf new file mode 100644 index 00000000..2ebe3cd4 Binary files /dev/null and b/homework/git-homework.pdf differ diff --git a/homework/unix-homework.md b/homework/unix-homework.md new file mode 100644 index 00000000..b2818426 --- /dev/null +++ b/homework/unix-homework.md @@ -0,0 +1,99 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +--- + + + +# **Unix Shell Homework** +```console +$ echo "Data Sciences Institute" +$ echo "by: Rachael Lam" +``` + +--- +##### **Expectations** +The goal of this homework is not to grade the competancy of what was learned, but to give students an opportunity to practice. This will help students remember the content and prepare for the next class. + +Because each class builds upon the last, it's important to review the content, as time is too limited for a full in-class review. + +--- + + +## `Day 1` + +--- +##### **To Review:** +Please practice the following commands. +1. current directory `pwd` +2. set working directory `cd` +3. list contents of working directory `ls` +4. create directory `mkdir` +5. create file `touch` +6. copy `cp` + +--- + + +## `Day 2` + +--- +##### **To Review:** + +Continue to practice commmands that we learned last week in addition to the new ones. You can upload different datasets relavent to your work to play with. +1. move and rename `mv` +2. remove `rm` +3. concatenate `cat` +4. extract columns from output `cut` +5. sort lines of text `sort` +6. report or omit repeated lines `uniq` + +--- +7. print lines matching a patter `grep` +8. search directories and subdirectories for files `find` +9. ouput the first part of a file `head` +10. output the last part of a file `tail` +
+ +Try each commmand with different... +- `-options` +- Wildcards +- Expansions +- Quoting and backslashing. + +--- +Create any script using functions. Try to include: +- Global variables +- Local variables +- Positional parameters + +--- + + +## `Day 3` + +--- +##### **To Review:** +It's time to put everything we've learned together! + +Please complete assignment 1 by October 30th. Submit assignment by email to your TA by 11:59 PM. + diff --git a/homework/unix-homework.pdf b/homework/unix-homework.pdf new file mode 100644 index 00000000..5db55e2b Binary files /dev/null and b/homework/unix-homework.pdf differ diff --git a/lessons/week_1/unix_slides.html b/lessons/week_1/unix_slides.html new file mode 100644 index 00000000..0834abd7 --- /dev/null +++ b/lessons/week_1/unix_slides.html @@ -0,0 +1,2007 @@ +
+

Unix Shell

+
$ echo "Data Sciences Institute"
+$ echo "by: Rachael Lam"
+
+
+
+

Unix

+
+
+
What is Unix?
+

Unix was created in 1970 and since then has branched into other versions including Linux. Linux was created from Unix with very similar features, although there are some minor differences in commands.

+

Unix shells - more specifically bash - is a powerful tool for quickly and easily navigating and manipulating files, scaling automated tasks, accessing Git and processing data.

+
+
+
So what is the shell?
+

The shell is any user interface/program that takes an input from the user, translates it into instructions that the operating system can understand, and conveys the output back to the user.

+

There are various types of user interfaces:

+
    +
  • graphical user interfaces (GUI)
  • +
  • touch screen interfaces
  • +
  • command line interfaces (CLI)
  • +
+
+
+
And what is bash?
+

We'll be focusing on command line interfaces (CLI), more specifically bash, which stands for Bourne Again SHell.

+

We'll also need a terminal emulator to interact with the shell. This is most likely called terminal on our menu.

+
+
+
Let's get started!
+

First, we'll open our terminal. As mentioned earlier, this is most likely called terminal and can be found by searching our computer, which on a Mac would be through cmd + space

+

Let's take a look at the terminal. What do we notice?

+
    +
  • last login
  • +
  • name
  • +
  • location
  • +
  • shell
  • +
+
+
+
Looking at the Shell
+

If we type echo $SHELL in our terminal, the output will tell us what shell we are working with. Most often, our shell will already be bash but in newer Macs, it could be zsh which is almost identitcal to bash. We can also see where bash is located by typing:

+
    +
  • whereis bash
  • +
  • whence bash
  • +
  • which bash
  • +
+
+
+

Let's start with a few commands and see what happens in our terminal.

+
$ echo Rachael
+
+
$ date
+
+
$ cal
+
+
$ lksjfs
+
+
+
+
    +
  • What happens when we type something that does not exist?
  • +
  • What happens with errors?
  • +
+
+
+

Navigate Files / Directories

+
+
+

Files

+
+
+

Knowing the different types of files available helps us better understand how to navigate and manipulate them.

+
    +
  • +

    Regular files are text files with readable characters.

    +
  • +
  • +

    Executable files are programs that are invoked as commands.

    +
  • +
  • +

    Shell scripts are executable files that we can read whereas bash is a non-human-readable executable file.

    +
  • +
+
+
+

Directories

+
+
+

Directories are files that are like folders which contain other files and directories (subdirectories), creating a hierarchical structure.

+
    +
  • +

    We can think of the structure of directories as a tree with the top of the tree being the root.

    +
  • +
  • +

    All files can be named and found in relation to the root by listing the directory names in order from the root, separated by slashes, followed by the file's name.

    +
  • +
+
+
+

Let's try three commands that help us navigate our system:

+
    +
  1. First, let's run the code below and see what happens:
  2. +
+
$ pwd
+
+

pwd prints our working directory. If we ever need to know where we are, we can execute this command.

+
+
+
    +
  1. Now, let's run the code below and see agian what happens:
  2. +
+
$ cd
+
+

By default, cd changes your working directory to your home directory. You can also use cd to set your working directory by including the desired pathname

+
$ cd Desktop
+
+
+
+

In the previous example, we were able to just state Desktop because it is a directory in our working directory. If we changed our working directory to Desktop, and then wanted to change it again to a directory in Desktop, we could again just specify the folder.

+

If we wanted to change the working directory to a directory outside of our working directory, we would need to specify a pathname:

+
$ cd /Users/rachaellam/Desktop
+
+
+
+
    +
  1. To know what files and folderes exist in our working directory, we can use the code below:
  2. +
+
$ ls
+
+

We can add a pathname at the end to list the contents of a specified directory.

+
+
+

Paths

+
+
+

As we've seen, directory names separated by slashes are paths. There are two types of paths, absolute and relative.

+
    +
  • +

    An absolute pathname begins at the root directory and includes each directory, separated by slashes until the desired directory or file is reached.

    +
  • +
  • +

    A relative pathname starts from the working directory and uses symbols . or .. to represent relative positions in the file tree.

    +
  • +
+
+
+

Using cd and pwd let's take a look at how we can use absolute and relative pathnames.

+
$ cd
+$ pwd
+
+
$ cd Desktop
+$ pwd
+
+
$ cd ..
+$ pwd
+
+
+
+

Here's another example using the /usr pathname.

+
$ cd /usr/bin
+$ pwd
+
+
$ cd /usr
+$ pwd
+
+
$ cd ..
+$ pwd
+
+
+
+

Let's now try move through some directories to get comfortable. Try out lots of different paths depending on the file structures of your computer. Try getting into different directories from different parent directories. The tilde notation ~ in the examples below refer to our home directory.

+
$ cd ~/Desktop
+$ pwd
+
+
$ cd ~/Desktop/dir1
+$ pwd
+
+
+
+

Questions?

+
+
+

Options and Arguments

+
+
+

Options and arguments are used to write commands that can make changes to our system. The syntax is:

+
$ command -option argument
+
+

Options can also be combined, which we'll briefly see now but learn more about a bit later.

+
+
+

There are two ways to write an -option:

+
    +
  1. Short option: one dash followed by a single character
  2. +
  3. Long option: two dashes followed by a word
  4. +
+

Some examples:

+

-a or --all
+-d or --directory
+-r or --reverse

+
+
+

Let's try these lines of code and see what happens:

+
$ ls -l
+
+
$ ls -lt
+
+
$ ls -lt -reverse
+
+

-l long format
+-t modification time
+-reverse reverse the sort order
+Notice how -lt is actually a combination of multiple options.

+
+
+

Questions?

+
+
+

Wildcards

+
+
+

Wildcards give us the ability to rapidly specify groups of filenames based on patterns of characters. Let's look at a few examples below:

+

* → matches any character

+

? → matches any single character

+

[characters] → matches any character that is in the set

+

[!characters] → matches any character that is not in the set

+
+
+

Some other helpful character wildcards are:
+[:digit:] → matches any numeral
+[:lower] → matches any lowercase letter
+[:upper:] → matches any uppercase letter

+
+
+

Let's try a few in our terminal:

+
$ ls * 
+
+
$ ls a*.txt
+
+
$ ls [abc]*
+
+
$ ls [[:upper:]]*
+
+
$ ls [![:digit:]]*
+
+
+
+

Questions?

+
+
+

Working with

+

Files / Directories

+
+
+

We're going to learn some basic commands to begin some preliminary coding. We'll also be using these throughout the module, so it's important to understand how they work now:

+
    +
  • create directory mkdir
  • +
  • create file touch
  • +
  • copy cp
  • +
  • move and rename mv
  • +
  • remove rm
  • +
+
+
+

Commands

+
+
+
mkdir
+

First let's make a directory. It's important to remember what directory you're working in currently, because that's where the new directory will be made. Let's assume for now, we're working on our desktop.

+
$ mkdir directory
+
+

We can also create multiple directories at the same time:

+
$ mkdir dir1 dir2 dir3
+
+
+
+
touch
+

We can also make new files from the command line. This is particularly useful when we want to make scripts, which we'll learn a bit later. Using touch, we can make a new file in our working directory.

+
$ touch file1
+
+

We can also create a specific file type by adding the extension:

+
$ touch file1.sh
+
+
+
+
cp
+

Now we're going to copy a file that we have on our desktop. It can be any file but remember to include the extension or if it has multiple characters, special characters and spaces, to wrap it in quotes.

+
$ cp file1 file2
+
+

We can also copy files or directories into a directory.

+
$ cp file1 dir1
+
+
+
+

And all files from one directory into another using wildcards:

+
$ cp dir1/* dir2
+
+

What does the /* in this command mean?

+
+
+

There are some useful -options that accompany cp:

+ + + + + + + + + + + + + + + + + + + + + +
OptionDescription
-iBefore overwriting an existing file, prompt the user for confirmation. If this option is not specified, cp will silently overwrite files.
-rRecursively copy directories and their contents. This option is required when copying directories.
-vDisplay informative messages as the copy is performed.
+
+
+
mv
+

The mv command enables us to move and rename files and directories, depending on how it's used. In the example below, mv renames file1 to file2.

+
$ mv file1 file2
+
+

Here, mv moves file1 to dir1

+
$ mv file1 dir1
+
+
+
+

We can also move directories into other directories:

+
$ mv dir1 dir2
+
+

In this case, if dir2 exists, dir1 will be moved to dir2. If dir2 does not exist, it will be created and dir1 will be moved to the newly created dir2. In both cases, the entire directory will be moved to another/new directory, rather than the contents.

+
+
+

Let's say we're in the directory Desktop and we just moved file1 into dir1 but now we want to put it back in Desktop. How would we move a file out of a directory into another one? Unfortunately we can't just say

+
$ mv file1 Desktop
+
+

because file1 does not exist in Desktop any more and the command will try and rename file1 to Desktop.

+
+
+

The answer involves using pathnames and the tilde ~ notation:

+
$ mv dir1/file1 ~/Desktop
+
+

If we just wanted to move file1 into dir2 (if dir2 is in our working directory), we could type:

+
$ mv dir1/file1 dir2
+
+
+
+

What if we want to move just the contents of dir1 to another directory rather than the whole folder? HINT: it is very (exactly) similar to copying (cp).

+
+
+
$ mv dir1/* dir2
+
+

This is a combination of the directory dir1, pathnames / and wildcards *. Here, dir1/* takes the all the contents of dir1 and puts it in dir2.

+

We could also use the same technique to specify certain files to move rather than all of them. How do you think this would be done?

+
+
+

Questions

+
    +
  • We're starting to combine our knowledge of files, directories and pathnames with some basic commands. How do we feel up to this point?
  • +
+
+
+
rm
+

To remove files we use the command rm. Because we're now deleting files, it's important that you're sure of what you're deleting because there is no way to undo. Fortunately!! there are ways to do this.

+
$ rm file1
+
+

Without specifying any -options, file1 will be deleted without any feedback.

+
+
+

To ensure we want to delete something, we can use the option -i (interactive) that we learned earlier.

+
$ rm -i file1
+
+

This will prompt a question asking us if we want to delete file1. We can respond with y if yes and n if not.

+
+
+

If we want to delete a directory, we need to use the option -r as we did when copying (cp). This will recursively delete everything inside of the directory and the directory itself.

+
$ rm -r dir1
+
+

If we're specifying multiple deletions and a directory does not exist, the shell will tell us. If we don't want that message, we can add the -option, -f (force). Force will override -i if it is included.

+
+
+
    +
  1. +

    How do you delete multiple directories?

    +
  2. +
  3. +

    What happens if you delete multiple directories with -i?

    +
  4. +
  5. +

    What happens if you delete multiple directories with i but one does not exist?

    +
  6. +
+
+
+

Remember, it's extremely important to remember that you cannot undo rm. This means, if you start using wildcards to specify filenames and don't include -i, you could delete things by accident. For example, let's say you want to delete all .txt files in a directory:

+
$ rm *.txt
+
+

If you accidently add a space between * and .txt, the rm command will delete all the files in the directory and then try to find a .txt file which does not exist because it delete everything.

+
+
+

Questions?

+
+
+

Input / Output

+
+
+
Standard Input/Output
+

Each program invokes the standard input, output and error.

+

We can think of the standard input default as coming from the keyboard and if we think of everything as a file, a command such as ls will result in a file called standard output and the status message to a file called standard error. By default, both are linked to the screen and not saved to a disk file.

+
+
+
Input/Output Redirection
+

Input/Output redirection allows us to change where the input comes from and where the output goes to, such as storing the output of a command into a file. We can do this using the redirection operator >.

+
$ ls -l /usr/bin > ls-output.txt
+
+

Here we have redirected the output of ls -l /usr/bin to a .txt file called ls-output.txt.

+
+
+

We can now see the details of that file and if it worked:

+
$ ls -l ls-output.txt
+
+

By looking at the details, we can see that the file was created and it a fairly large text file, indicating that something was written to it.

+
+
+

If we specify a directory that does not exist, we receive the standard error:

+
$ ls -l /bin/usr > ls-output.txt
+
+

Why was the standard error not written to the .txt file?
+What happened to our ls-output.txt file?

+
+
+

Although the standard error was not written to the .txt file, the destination file is always written from the beginning, therefore, the redirection began to write the file and once noticed there was an error, stopped, resulting in an empty file.

+

So how do we append rather than rewrite? By using the redirection operator >>.

+
$ ls -l /usr/bin >> ls-output.txt
+
+
+
+

If we want to redirect the standard error, we need to use the redirection operator 2>

+
$ ls -l /bin/usr 2> ls-error.txt
+
+

If we want to redirect both the standard output and standard error to one file, we have two options.

+
    +
  1. Use 2>&1 at the end of the command.
  2. +
+
$ ls -l /bin/usr > ls-output.txt 2>&1
+
+
    +
  1. Use &> in place of >
  2. +
+
$ ls -l /bin/usr &> ls-output.txt
+
+
+
+

Questions

+
+
+
cat
+

cat takes one or more files and copies them to standard output. Using the ls-output.txt created earlier, we can see how that's done:

+
$ cat ls-output.txt
+
+
+
+

We can also use it to join files togther. Let's say I have two files, file1 and file2 and I want to combine them into a file called file3:

+
$ cat file1 file2 > file3
+
+

Now the contents of file1 and file2 should be combined.

+
+
+

We can also use cat to add to a .txt file.

+
$ cat > new_cat.txt
+
+

Now we can type the text that we want in the file. Once we're finished, we can use CTRL-D to exit.

+

What would be the difference between $ cat > new_cat.txt and $ cat >> new_cat.txt?

+
+
+

Finally, we can redirect the standard input from the keyboard to the file new_cat.txt

+
$ cat < new_cat.txt
+
+

This is almost identitcal to just typing $ cat new_cat.txt but we can see later how it could be more useful.

+
+
+

Questions?

+
+
+

Pipes / Filters

+
+
+

We use pipelines to read data from standard output and send to standard input using the pipe operator|. This means the standard output of one command can be piped into the standard input of another.

+

Several commands put together in a pipeline are often referred to as filters. Filters take an input, change it and then output it.

+
+
+

Commands

+
+
+

Let's learn a few more commands that will help us further understand pipelines and filters. We'll learn:

+
    +
  • extract columns from output cut
  • +
  • sort lines of text sort
  • +
  • report or omit repeated lines uniq
  • +
  • print lines matching a patter grep
  • +
  • search directories and subdirectories for files find
  • +
  • ouput the first part of a file head
  • +
  • output the last part of a file tail
  • +
+
+
+
cut
+

Let's look at a csv to see how we can initially see our data. Because it's a csv, each line is separated by a comma. Let's first read that file using cat:

+
$ cat parking_data.csv
+
+

We'll see a lot of text, so let's make some sense of it using cut.

+
+
+

To use cut, I need to pass a couple options:

+
    +
  1. -d which cuts the text based on what follows. For example, -d: will cut based on colons or -d" " will cut based on a space.
  2. +
  3. -f, which extracts a particular field based on what follows. For example, -f1 will take the first field or -f2 will take the second field and so on.
  4. +
+
+
+

In this example, I'm taking the file parking_data and cutting it based on colons and then only extracting the first field.

+
$ cut -d, -f1 < parking_data.csv
+
+

What happens if I add another -f option? What does this do?

+
$ cut -d, -f1 -f2 < parking_data.csv
+
+

How would I specify more than three fields?

+
+
+
sort
+

How can we make our previous example more readable?

+

One answer is to use the sort feature. We can pipe this with the cut feature:

+
$ cut -d, -f1 < parking_data.csv | sort
+
+
+
+
uniq
+

Additionally, I can make the above command even more readable by removing any duplicates with uniq

+
$ cut -d, -f1 < parking_data.csv | sort | uniq
+
+
+
+

Questions?

+
+
+
grep
+

grep is a powerful tool for finding patterns in text files. The syntax is:

+
$ grep pattern [file...]
+
+

In our case, we're going to use it with our previous example and pipe it with other commands:

+
$ cut -d, -f1 parking_data.csv | sort | uniq | grep FIRE
+
+

The results are all patterns of FIRE in the text file.

+
+
+
find
+

Another useful use for grep is to find files in directories. grep is nicely combined with find for this feature.

+
$ find ~/Desktop/dir1 | grep cat
+
+

Here we're searching in the directory dir1 with the pattern cat. This would be helpful if we wanted to know if there were any files with the word cat in the filename.

+
+
+
head / tail
+

We can also extract the first and last part of files using head and tail. We can also add the option -n followed by a number to extract a certain number of lines.

+
$ head -n 5 ls-output.txt
+
+
$ tail -n 5 ls-output.txt
+
+
+
+

head and tail can also be used in pipelines:

+
$ cut -d, -f1 < parking_data.csv | sort | uniq | head -n 5
+
+
$ cut -d, -f1 < parking_data.csv | sort | uniq | tail -n 5
+
+
+
+

Questions?

+
+
+

Expansions

+
+
+

Expansion uses special characters to expand upon something before the shell processes it. We have learned a few expansions so far such as the tilde ~ and wildcards *. We've also seen some character wildcards [characters].

+

Expansions are another feature that help us when we're manipulating and working with files and directories.

+

Other examples of expansions are:

+
    +
  • arithmetic expansion
  • +
  • brace expansion
  • +
+
+
+
Arithmetic Expansion
+

Arithmetic expansion basically makes the shell a calculator.
+The syntax is:

+

$((expression))

+

For example:

+
$ echo $((2 + 2))
+
+

Arithmetic expressions can nested:

+
$ echo $(($((2 + 2)) * 3))
+
+
+
+

Just for reference, here is a list of the arithmetic operators:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperatorDescription
+Addition
-Subtration
*Multiplication
/Integer division
**Exponentiation
+
+
+
Brace Expansion
+

Brace expansions allow us to create multiple text strings from a pattern containing braces. Here are a few examples:

+
$ echo Test-{A,B,C}-Example
+
+
$ echo Number_{1..5}
+
+
$ echo {Z..A}
+
+

Brace expansions can also be nested:

+
$ echo a{A{1,2},B{3,4}}b
+
+
+
+

We can use brace expansion to help make multiple directories using mkdir.

+
$ mkdir dir-{1..3}
+
+

This command makes 3 directories named dir-1, dir-2 and dir-3

+
+
+

Quoting / Backslashing

+
+
+

Quoting suppresses unwanted expansions. We can use double quotes, single quotes or backslashes:

+
    +
  • Double quotes force special characters to lose their meaning and are treated as ordinary characters except for
    +* \ '
  • +
  • Single quotes suppress all expansion
  • +
  • Backslashes are used to escape single characters
  • +
+
+
+

Many times there will be file names or directories that are named with spaces. In this case, we'll need to use double quotes so that the shell can read it.

+

Using touch we can create a text file named something separated with two words:

+
$ touch "two words.txt"
+
+

We can then see the details of the file we just created:

+
$ ls -l "two words.txt"
+
+
+
+

If we want to rename the text, we would do as follows:

+
$ mv "two words.txt" two_words.txt
+
+
+
+

Let's see what these three examples do in shell:

+
$ echo '2 * 3 > 5 is an equation'
+
+
$ echo '2 * 3 > 5' is an equation
+
+
$ echo 2 \* 3 \> 5 is an equation
+
+
+
+

Questions?

+
+
+

Command Line Editing

+
+
+

Getting familiar with command line editing can save you time. Bash uses a library called Redline to use command line editing

+

There are many shortcuts and you don’t have to memorize them all, just use the ones that you feel are best. There are even more shortcuts that you can read about in the textbooks!

+
+
+
Character Commands
+ + + + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
CTRL-BMove one character backwards
CTRL-FMove one character forwards
DELDelete one character backwards
CTRL-DDelete one character at cursor location
+
+
+
Word Commands
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
ESC-BMove one word backwards
ESC-FMove one word forwards
ESC-DELDelete one word backwards
ESC-DDelete one word forwards
CTRL-YUndo
+
+
+
Line Commands
+ + + + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
CTRL-AMove to beginning of the line
CTRL-EMove to end of the line
CTRL-KDelete text from the cursor to end of line
CTRL-UDelete text from the cursor to the beginning of the line
+
+
+
History Line Commands
+ + + + + + + + + + + + + + + + + + + + + +
CommandDescription
CTRL-PMove to the previous line in your history of commands
CTRL-NMove to the next line in your history commands
!!Repeat the last command
+
+
+ + + + + + + + + + + + + + + + + + + + + +
CommandDescription
!numberRepeat history list item number
!stringRepeat last history item starting with string
!?stringRepeat last history item containing string
+
+
+

Questions?

+
+
+

Completion Command

+
+
+

Completion commands autocomplete your command if it exists by hitting tab. If it does not exist, the command will not be able to complete.

+

If multiple exist, the command will also not be able to complete because it will not know which one to choose.

+

For example, let's say we have two files called file1 and file2. If would not be able to use autocomplete because the shell will not know which to choose until the last character.

+
+
+

If we have two files, one called foot.txt and one called file.tx. This command would not be able to autocomplete:

+
$ ls f
+
+

But this one will:

+
$ ls fil
+
+
+
+

Questions?

+
+
+

Shell Scripts

+
+
+
Shell Scripts
+

Shell scripts allow us to combine several commands into one file, rather than one by one on the command line.

+

The shell will read the script just as if you were to write the command on the command line.

+

Most things that can be done in the shell script can be done on the command line and vice versa.

+
+
+
Writing Shell Scripts
+

There are three important considerations when writing the shell script

+
    +
  1. Write a script: scripts are ordinary text files. You can use a text editor that will provide syntax highlighting (color coding elements of the script). It can help find errors but writing in TextEdit is possible.
  2. +
  3. Make a shell script executable: set the script permissions to allow it to be executed
  4. +
  5. Put the shell script somewhere the shell can find it: the shell script automatically searches certain directories for executable files when no explicit pathname is specified.
  6. +
+
+
+
Set Up
+

Open either TextEdit or your text editor of choice. Some popular programs are Sublime Text, Vim, Atom and Notepad++.

+

If you want to see the syntax highlighting, you might have to save your script as a .sh file. Without doing this, your file will just look like a regular .txt file.

+

Once you open your text editor and save it, we can begin our first script!

+
+
+
Script File Format
+

We must first tell the shell the name of the interpreter that should be used to execute the script. This is marked by using a shebang: #!

+

Throughout the script, you can and should use # to make comments. Comments make your code more readable and can help you understand your code when you come back to it.

+
+
+
#!/bin/bash
+
+# this is our first comment
+
+echo "This is our first script!"
+
+

Here we can see we've told the shell to use/bin/bash using the shebang #!
+We've also added a comment using #
+And finally, something quite familiar, we have our first line of script using echo

+
+
+
A Note on Commenting
+

Commenting is important not just so you can understand your own work, but also so other can understand your work in collaborative projects. It also helps make your code reproducible.

+

Comments can be inline:

+
echo "Hello World" #this is an inline comment
+
+

or as comment blocks:

+
#this is a comment block
+echo "Hello World"
+
+
+
+

Questions?

+
+
+
Executable File Permission
+

In order to execute our file, we have to add file permissions:

+

chmod helps make our script executable
+775 is used to make scripts that everyone can execute
+700 is used to make scripts that only the owner can execute

+
+
+

Here, chmod is combined with 775 so that everyone can execute the script:

+
$ ls -l first_script.sh
+
+
$ chmod 775 first_script.sh
+
+
+
+
Script File Location
+

In order to run our script, we have to call it using ./ in front of the script filename (./script).

+

File location is important to run your script. If just script was written, the shell would not be able to find the script and try read it as a command, ouputting command not found.

+

Running echo $PATH helps us see what directories are being searched for the script.

+
+
+

If we want to run our script without ./, we can create a /bin for our script, move our script into the bin folder and then run it. It's important to note that we have to make this bin in our home directory. If we made it on our Desktop, the script would still not be found.

+
$ mkdir bin
+$ mv first_script.sh bin
+$ first_script.sh
+
+

In this block of code, we're making the bin folder using mkdir, moving the script into the bin with mv and then running the script without ./.

+
+
+
Good Locations for Scripts
+

For personal use, a good place to put your script is /bin.

+

For everyone's access, it's better to put scripts in /usr/local/bin.

+
+
+

Questions?

+
+
+

Shell Functions

+
+
+
Functions
+

Functions are a good way to break down code into smaller, more manageable chunks. Each chunck can represent a task.

+

For example, let's say your entire process is make pasta. It can be broken down into:

+
    +
  1. Prepare vegetables
  2. +
  3. Make sauce
  4. +
  5. Cook pasta
  6. +
  7. Serve
  8. +
+
+
+

Each of these steps can be expanded further into sub processes. Cook pasta can be:

+
    +
  1. Fill pot with water
  2. +
  3. Boil water
  4. +
  5. Measure pasta
  6. +
  7. Add pasta to boiling water
  8. +
  9. Cook for 8-12 minutes
  10. +
  11. Strain
  12. +
+
+
+

Functions have two syntactic forms:

+
function name {
+    commands
+    return
+}
+
+
name () {
+    commands
+    return
+}
+
+

name is the name of the function
+commands are the commands contained in the function

+
+
+

Let's write our first function:

+
#!/bin/bash
+
+function funct {
+    echo "Step 2"
+    return
+}
+
+#program starts here
+
+echo "Step 1"
+funct
+echo "Step 3"
+
+

What do you think this function will output?

+
+
+

Let's save and run this function in our terminal to see what happens.

+

Here's a good time to recap how to save, grant permissions and run the script.
+chmod - permissions command
+775 - grant permissions to everyone
+700 - grant permissions to yourself
+/bin - where to save permissions

+
+
+

Questions?

+
+
+

Variables

+
+
+
Global Variables
+

Let's make our script more complex with some variables. We can first define variables directly through the terminal.

+
$ foo="something cool"
+$ echo $foo
+
+

Notice how in order to call the variable we need to add $ before the variable. The quotes are not necessary if the value of the variable doesn't include spaces when defining it. If we did not include the quotes here, we would receive an error.

+
+
+

Now let's add some global variables to our script:

+
#!/bin/bash
+
+step="Step 2"
+
+function funct {
+    echo $step
+    return
+}
+
+#program starts here
+
+echo "Step 1"
+funct
+echo "Step 3"
+
+

What do we think will be the output in this example?

+
+
+
Local Variables
+

Local variables are variables that are contained within the function. Because they're contained, they can have names that already exist in the shell globally or within other shell functions.

+
+
+
#!/bin/bash
+
+foo=0 # global variable foo
+funct_1 () {
+    local foo # variable foo local to funct_1
+    foo=1
+    echo "funct_1: foo = $foo"
+}
+
+funct_2 () {
+    local foo # variable foo local to funct_2
+    foo=2
+    echo "funct_2: foo = $foo"
+}
+
+echo "global:  foo = $foo"
+funct_1
+echo "global:  foo = $foo"
+funct_2
+echo "global:  foo = $foo"
+
+
+
+

What would happen if we removed local?

+
#!/bin/bash
+
+foo=0 # global variable foo
+funct_1 () {
+    foo=1
+    echo "funct_1: foo = $foo"
+}
+
+funct_2 () {
+    foo=2
+    echo "funct_2: foo = $foo"
+}
+
+echo "global:  foo = $foo"
+funct_1
+echo "global:  foo = $foo"
+funct_2
+echo "global:  foo = $foo"
+
+
+
+

Questions?

+
+
+

Parameters

+
+
+
Positional Parameters
+

Positional parameteres are built in parameters that allow our programs to get access to the contents of the command line. This is extremely valuable when we are creating scripts and then want to pass a parameter through the script from the command line.

+

If our code has more than 9 positional parameters, you need to enclose the positional parameter in curly brackets ${10}

+

Let's create a script to see how this works:

+
+
+
#!/bin/bash
+
+echo "
+Number of arguments: $#
+\$0 = $0
+\$1 = $1
+\$2 = $2
+\$3 = $3
+\$4 = $4
+\$5 = $5
+\$6 = $6
+\$7 = $7
+\$8 = $8
+\$9 = $9
+"
+
+
+
+

In the example, you may notice that we haven't given $0 any specific value.
+Let's try run the script a couple ways through the command line to see what this means:

+
    +
  1. Run the script with arguments a b c d.
  2. +
  3. Run the script with any arguments of your choice.
  4. +
+

What do we notice?

+
+
+
$* and $@
+

$* → Expands into the list of positional parameters, starting with 1. When surrounded by double quotes, it expands into a double quoted string containing all of the positional parameters, each separated by the first character of the IFS shell variable (by default a space character).
+$@ → Expands into the list of positional parameters, starting with 1. When surrounded by double quotes, it expands each positional parameter into a separate word surrounded by double quotes.

+
+
+

Let's take a look at this code piece by piece:

+
print_params () {
+    echo "\$1 = $1"
+    echo "\$2 = $2"
+    echo "\$3 = $3"
+    echo "\$4 = $4"
+}
+
+pass_params () {
+    echo -e "\n" '$* :'; print_params $*
+    echo -e "\n" '"$*" :'; print_params "$*"
+    echo -e "\n" '$@ :';   print_params $@
+    echo -e "\n" '"$@" :'; print_params "$@"
+}
+
+pass_params "word" "words with spaces"
+
+
+
+
    +
  1. Here we have two functions: print_params () and pass_params (). pass_params () calls on the function print_params () within its function.
  2. +
  3. In the first function, echo is printing the line inside the double quotes. The \ in front of $1 escapes the $, thus losing its meaning, as we learned earlier.
  4. +
+
print_params () {
+    echo "\$1 = $1"
+    echo "\$2 = $2"
+    echo "\$3 = $3"
+    echo "\$4 = $4"
+}
+
+
+
+
    +
  1. In the second function, echo again is printing the line inside the single quotes. "\n" is adding a tab at the beginning of the line for readability. It is then calling on the first function (print_params ()) with the argument $*. The second echo is calling the first function but with the argument $* in double quotes. This is repeated for $@
  2. +
+
pass_params () {
+    echo -e "\n" '$* :'; print_params $*
+    echo -e "\n" '"$*" :'; print_params "$*"
+    echo -e "\n" '$@ :';   print_params $@
+    echo -e "\n" '"$@" :'; print_params "$@"
+}
+
+
+
+
    +
  1. In the final part of the code, we're calling on the pass_params () function and passing two arguments: "word" and "words with spaces".
  2. +
+
pass_params "word" "words with spaces"
+
+
+
+

Let's see what happens's when we run the script in terminal. Remember, we don't have to pass any arguments in the command line because we have done so in our script.

+
+
+

Questions?

+
+
+

Let's take a look at another example. In this example we'll get a greater understanding of variables and positional parameteres:

+
function afunc {
+  echo in function: $0 $1 $2
+  var1="in function"
+  echo var1: $var1
+}
+
+var1="outside function"
+
+echo var1: $var1
+echo $0: $1 $2
+afunc funcarg1 funcarg2
+echo var1: $var1
+echo $0: $1 $2
+
+
+
+

Let's break it down again:

+
    +
  1. In our first function called afunc, using echo we will print in function: and pass 3 positional parameters. We will then define the variable var1 and call it "in function" and print it using echo again.
  2. +
+
function afunc {
+  echo in function: $0 $1 $2
+  var1="in function"
+  echo var1: $var1
+}
+
+
    +
  1. Outside of the function, we'll create another variable also named var1 and give it the value of "outside function"
  2. +
+
var1="outside function"
+
+
+
+
    +
  1. We'll then add the program.
    +a) echo, we'll print var1
    +b) Print 3 positional paramaeters
    +c) Call the function with two arguments
    +d) Print var1 again
    +e) Print 3 positional parameters again
  2. +
+
echo var1: $var1
+echo $0: $1 $2
+afunc funcarg1 funcarg2
+echo var1: $var1
+echo $0: $1 $2
+
+
+
+

Let's run it in our terminal without any additional arguments and see what the output is.

+
    +
  • Why did echo $0: $1 $2 only output one argument?
  • +
  • Why did var1 change the third time to inside function rather than outside function?
  • +
+
+
+

Now let's change and add a few things to see what happens:

+
    +
  • In our terminal, what happens if we pass two arguments by entering ascript.sh arg1 arg2 with ascript.sh being the name of our script and arg1 arg2 being two random arguments?
  • +
  • What happens if we add local to our function?
  • +
+
+
+

Questions?

+
+
+
Parameter Expansion
+

Let's discuss the difference between $a and ${a}

+

$a on it's own is fine, but when placed next to another string, it can confuse the shell. For example:

+
    +
  • +

    $a_file the shell will try to expand a variable named a_file rather than a

    +
  • +
  • +

    ${a}_file the shell will now try to expand the variable a

    +
  • +
+

This can help us be more flexible when navigating and manipulating files and directories.

+
+
+

Let's look at the code below to see how this helps us:

+
$ filename="myfile"
+$ touch $filename
+$ mv $filename ${filename}1
+
+

This block of code creates a file based on our defined variable and then renames it with the same variable but with an additional component.

+
+
+

Parameter expansion also help us if our variables are unset (ie. do not exist) or are empty. Let's take a look at a couple examples in the next few slides.

+
+
+
    +
  1. ${parameter:-x} If parameter is unset or empty, expansion results in the value of x. If it's not empty, it results in the value of the parameter
  2. +
+
$ foo=
+$ echo ${foo:-"something else"}
+$ echo $foo
+$ foo=bar
+$ echo ${foo:-"something else"}
+$ echo $foo
+
+

Through this sequence of commands we can see that when $foo is empty, :- fills the variable with "something else". Once we define the variable, :- results in our defined variable.

+
+
+
    +
  1. ${parameter:=x} If parameter is unset or empty, expansion results in the value of x and the value of x is assigned to the parameter. If it's not empty, it results in the value of the parameter
  2. +
+
$ foo=
+$ echo ${foo:="something else"}
+$ echo $foo
+$ foo=bar
+$ echo ${foo:="something else"}
+$ echo $foo
+
+

We can see that when $foo is empty, := assigns the variable with "something else". If we define the variable again, :- results in our second defined variable.

+
+
+
    +
  1. ${parameter:?x} If parameter is unset or empty, this expansion causes the script to exit with an error, and the contents of x are sent to standard error. If parameter is not empty, the expansion results in the value of parameter.
  2. +
+
$ foo=
+$ echo ${foo:?"something else"}
+$ echo $?
+$ foo=bar
+$ echo ${foo:?"something else"}
+$ echo $?
+
+

We can see that when $foo is empty, :? gives us an error which we can see as echo $ outputs 1. If we define the variable again, :? results in the value of our variable.

+
+
+
    +
  1. ${parameter:+x} If parameter is unset or empty, the expansion results in nothing. If parameter is not empty, the value of x is substituted for parameter; however, the value of parameter is not changed.
  2. +
+
$ foo=
+$ echo ${foo:+"something else"}
+$ echo $foo
+$ foo=bar
+$ echo ${foo:+"something else"}
+$ echo $foo
+
+

Here, :+ resulted in an empty output and the value of $foo remains empty. If we define the variable, :+ will still ouput what we defined, but it will not reassign the variable perminently.

+
+
+
String Operators
+

String operators are extemely valuable for operations on pathnames. They can help extract parts of pathnames, especially if they follow a pattern. Many pathnames typically follow patters, such as all extensions are preceeded with ..

+

Some character expansions are:

+
    +
  1. ${#parameter}
  2. +
  3. ${parameter:offset}
  4. +
  5. ${parameter:offset:length}
  6. +
+
+
+
    +
  1. ${#parameter} expands into the length of the string contained by the parameter.
  2. +
+
$ foo="Toronto needs more trees"
+$ echo "'$foo' is ${#foo} characters long."
+
+
+
+

With the following expansions, we can extract a portion the string contained by the parameter.

+
    +
  1. ${parameter:offset} will extract characters from offset characters to the end of the string. For example, counting from the beginning of the string, the n of needs is 8 characters from the beginning. Because did not specify an end, echo will print from needs onwards.
  2. +
+
$ foo="Toronto needs more trees"
+$ echo ${foo:8}
+
+
+
+
    +
  1. ${parameter:offset:length} will specify the length that we want to extract. This length is counted not from the beginning of the string, but from the offset of the string.
  2. +
+
$ foo="Toronto needs more trees"
+$ echo ${foo:8:5}
+
+

We can see that from the beginning of the string, n is 8 characters in, and from n, s of needs is the 5th character from n. Therefore, our ouput will be needs.

+
+
+

Questions?

+
+
+

Let's now see how to use patterns in our parameter expansions. There are several ways we can achieve this:

+
    +
  1. ${parameter#pattern}
  2. +
  3. ${parameter##pattern}
  4. +
  5. ${parameter%pattern}
  6. +
  7. ${parameter%%pattern}
  8. +
+
+
+
    +
  1. ${parameter#pattern} removes the shortest leading portion of the string contained in parameter defined by the pattern.
  2. +
+
$ foo=/User/name/Desktop/file.txt.zip
+$ echo ${foo#/*/}
+
+

In this example, we've defined foo as a file with an extension. The expansion matches any (*) pattern of /*/ and returns the shortest leading portion.

+
+
+
    +
  1. ${parameter##pattern} is very similar to the previous expansion except it removes the longest leading portion of the string.
  2. +
+
$ foo=/User/name/Desktop/file.txt.zip
+$ echo ${foo##/*/}
+
+

Very similar to the previous example, the expansion matches any (*) pattern of /*/ and returns the longest leading portion.

+
+
+
    +
  1. ${parameter%pattern} removes the shortest ending portion of the string rather than the beginning.
  2. +
+
$ foo=/User/name/Desktop/file.txt.zip
+$ echo ${foo%.*}
+
+
    +
  1. ${parameter%pattern} removes the longest ending portion of the string.
  2. +
+
$ foo=/User/name/Desktop/file.txt.zip
+$ echo ${foo%%.*}
+
+
+
+

What happens if we change our pattern to #*_?

+

Let's pretend a file named "rachaels_file" and we want to know its extension. How would we do that?

+

What if our file was name "rachaels file"

+
+
+

We can also use expansions to replace the contents of the parameter with a string based on the pattern.

+
    +
  1. ${parameter/pattern/string} replaces only the first occurence of pattern.
  2. +
  3. ${parameter//pattern/string} replaces all occurances.
  4. +
  5. ${parameter/#pattern/string} requires the match to occur at the beginning of the string to replace it.
  6. +
  7. ${parameter/%pattern/string} requires the match to occur at the end of the string to replace it.
  8. +
+
+
+

Let's see how this would work:

+
$ foo="MP3.MP3"
+
+
$ echo ${foo/MP3/mp3}
+
+
$ echo ${foo//MP3/mp3}
+
+
$ echo ${foo/#MP3/mp3}
+
+
$ echo ${foo/%MP3/mp3}
+
+
+
+

Can you think of when this might be helpful?

+

Let's say I have a a named "rachaels cool file". I want to rename them because spaces cause problems in filenames. How would I do this?

+
+
+

Questions?

+
+
+
Arithmetic Assignment
+

We have seen assignment before with examples such as foo=5. This is a simple assignment but we can also add complexity to this assignment with other operators.

+
    +
  • $((parameter += x)) assigns the parameter to itself + x
  • +
  • $((parameter -= x)) assigns the parameter to itself - x
  • +
  • $((parameter *= x)) assigns the parameter to itself * x
  • +
  • $((parameter /= x)) assigns the parameter to itself / x
  • +
+
+
+

We can also increase or decrease our parameters by one.

+
    +
  • $((parameter++)) increases parameter by one after the parameter is retruned
  • +
  • $((parameter--)) decreases the parameter by one after the parameter is returned
  • +
  • $((++parameter)) increases parameter by one before the parameter is returned
  • +
  • $((--parameter)) decreases parameter by one before the parameter is returned.
  • +
+
+
+

These are very subtle changes so let's see what we mean after and before a parameter is returned:

+
$ foo=1
+$ echo $((foo++))
+$ echo $foo
+
+
$ foo=1
+$ echo $((++foo))
+$ echo $foo
+
+
+
+

Questions

+
+
+
Command Substitution
+

So far we've learned how to get values into variables by using assignment statements (x=5) and positional parameters (x=$1). Another way is command substitution which allows you to use the standard output of the command as if it were a variable.

+
+
+

Let's say we want to assign a variable to the output of a command so that we can apply another command to that output. In this particular case, we want to make a variable equal all files beginning with t. We then want to apply a sort command on that variable:

+
$ x=$(find t*)
+$ echo $x | sort
+
+

Although this seems quite simple now, we'll see how this can be extremely powerful when we move into flow control.

+
+
+

Flow Control

+
+
+

Flow control allows programs to "change directions" based on the results from a given input.

+

Bash supports several constructs:

+
    +
  • if/else
  • +
  • while / until
  • +
  • case
  • +
  • for
  • +
+
+
+

if / else

+
+
+

if/else is a conditional statement that chooses whether or not to do something based on a true or false statement.

+
if condition; then
+    commands
+
+[elseif condition; then
+    commands...]
+
+[else
+    commands]
+
+fi
+
+
+
+

Here, we've assigned x to the value 5. We've then written an if/else statement that asks if x is equal to 5 than tell us that x equals 5. Otherwise (else), tell us that x does not equal 5

+
x=5
+
+if [ $x = 5 ]; then
+    echo "x equals 5."
+
+else
+    echo "x does not equal 5."
+
+fi
+
+
+
+

Let's take a look at a more practical example: we want to know if there are any files in our directory that contain spaces.

+
#!/bin/bash
+
+cd ~/Desktop/dir1
+
+if [[ -n $(find t* | grep " ") ]]; then
+	echo "A file contains a space"
+else
+	echo "No files contain a space"
+fi
+
+
+
+

First we've changed our working directory to dir1:

+
cd ~/Desktop/dir1
+
+

We then utilized command substitutions that we've just learned by storing the output of files that contain a space. The -n option checks if the length of of a string is nonzero:

+
-n $(find t* | grep " ")
+
+
+
+

By wrapping our output in an if statement, we're stating:

+
    +
  1. if the value of $(find t* | grep " ") is nonzero, then print (echo) "A file contains a space"
  2. +
  3. Otherwise (else), print (echo) "No files contain a space"
  4. +
+
+
+

Questions

+
+
+
Control Operators
+

Control operators (&& and ||) allow you to test more than one thing at a time. Their syntax is:

+
if command1 && command2; then
+    ...
+fi
+
+
if command1 || command2;  then
+    ...
+fi
+
+
+
+

With the && operator, command1 is executed and command2 is executed only if command1 is successful

+

With the || operator, command1 is executed and command2 is executed only if command1 is unsuccessful

+
+
+

Example of &&

+
filename=$1
+word1=$2
+word2=$3
+
+if grep $word1 $filename && grep $word2 $filename; then
+    echo "$word1 and $word2 are both in $filename."
+fi
+
+
+
+

Using positional parameters that we learned earlier, what do you think will happen if we run the previous code?

+
    +
  • What happens if both words exist?
  • +
  • What happens if only one word exists?
  • +
  • What happens if no words exist?
  • +
+
+
+

Example of ||

+
filename=$1
+word1=$2
+word2=$3
+
+if grep $word1 $filename || grep $word2 $filename; then
+    echo "$word1 or $word2 is in $filename."
+fi
+
+
+
+

Similarly, what will happen if...

+
    +
  • What happens if both words exist?
  • +
  • What happens if only one word exists?
  • +
  • What happens if no words exist?
  • +
+
+
+

Questions?

+
+
+

While

+
+
+

Using the while command, let's discuss looping. Looping allows portions of a program to repeat as long as the condition is false. This syntax is:

+
while condition; do
+    commands
+done
+
+
+
+

Let's make a basic while script that displays five numbers in sequential order from 1 to 5 and then tells us when it's finished.

+
#!/bin/bash
+
+# script called while-count.sh
+
+count=1
+
+while [ $count -le 5 ]; do
+    echo $count
+    count=$((count +1))
+done
+echo "Finished."
+
+

Why does the loop end?

+
+
+

While loops are extremely helpful to read lines of a file and then perform some command if a line meets a certain condition. Let's explore how to read lines first:

+
file=file1
+
+while read -r line; do 
+	echo $line
+done < "$file"
+
+

In this script, we're creating a variable with our file. We're then reading the file until the last line is read. In this example, we're using an input redirection that we learned earlier (<), which passes the file into the read command. We've also used -r so that any backslashes are escaped.

+
+
+

Because line is acting as variable, we can also nest another loop if $file meets a condition. Let's say we have a file and we want to know every line that has bananas in it.

+

How would we combine the while loop with an if statement?

+
+
+
while read -r line; do
+	if [[ $line == *"bananas"* ]]; then
+		echo $line
+	fi
+done < "$file"
+
+

Here we're reading the file line by line using the while loop. We're then saying if our variable, $line equals "banana", then print the $line.

+
    +
  1. Why have we added the wildcard *?
  2. +
  3. What would happen if we didn't include *?
  4. +
+
+
+

Questions?

+
+
+

Until

+
+
+

Until loops are similar to while, except unlike while loops that run as long as the condition is true, the until loop will run as long as the condition is false

+
until condition; do
+    commands
+done
+
+
+
+

Let's create a script similar to the while statement: a basic while script that displays five numbers in sequential order from 1 to 5 and then tells us when it's finished.

+
count=1
+
+until [ $count -gt 5 ]; do
+    echo $count
+    count=$((count +1))
+done
+echo "Finished."
+
+

How is this script different to the while loop?

+
+
+

How might this be useful? Let's say we want to create 3 directories labeled dir1, dir2 and dir3:

+
x=1
+until [[ $x == 4 ]]; do
+	echo "Creating dir$x..."
+	mkdir dir$x
+	((x++))
+done
+
+

Here we've created a variable x=1 because we want our first directory to be dir1. We're then saying up until x=4, make a directory mkdir called dir plus our variable. We've then added 1 to x each iteration using an arithmetic assignment. The echo part is just to give us some feedback on what is happening behind the scenes.

+
+
+

Questions?

+
+
+

for

+
+
+

For our final flow control, we're going to learn a powerful loop called for. The syntax is:

+
for variable [in words]; do
+    commands
+done
+
+

What we might notice is that this flow uses variables that will increment during the execution of the loop.

+
+
+

How would we use for if we wanted to list all files and directories in a folder?

+
for i in $(find *); do
+    echo $i
+done
+
+

The variable i becomes all instances of the variable
+$(find *). For each instance of i, we are then printing it.

+

Although this seems quite basic and there more simple ways to list all files and directories (ls), this enables us to do many things with the looped variable i by nesting other loops.

+
+
+

What other ways can we use for loops?
+What other ways can we use for loops within files?

+
+
+

Questions?

+
    +
  • Why do we use i?
  • +
+
+
+
+
Next Week: Git and Github
+
    +
  • Please make sure to come with a GitHub account
  • +
+
+
+

Additional Material

+
+
+
Exit Status
+

Commands issue a value to the system when they terminate, which is an integer in the range of 0 and 255 indicating the success or failure of a command's execution.

+

Conventionally, zero indicates success and any other value indicates failure.

+
+
+

Let's list a file that we know exists on our desktop:

+
$ ls -d /usr/bin
+$ echo $?
+
+

-d is an option that returns the file if it exists and is a directory.
+$? returns the value of the last executed command. The value being either zero for succes or any other number for failure.

+
+
+

If we then list a file that we know does not exist in our desktop and return the value of $?, what do we expect to happen?

+
$ ls -d /bin/usr
+$ echo $?
+
+
+
+
Exit Command
+

The exit command in a script replaces the return command and accepts a single, optional argument, which becomes the scripts exit status.

+

When no argument is passed, it defaults to zero.

+

This enables our scripts to indicate an error.

+

If the script is a function in a larger program, we can use return instead of exit with a single, optional argument, allowing our function to indicate an error.

+
+
+
#!/bin/bash
+
+# test-file: Evaluate the status of a file
+
+FILE=~/.bashrc
+
+if [ -e "$FILE" ]; then
+    if [ -f "$FILE" ]; then
+        echo "$FILE is a regular file."
+    fi
+    if [ -d "$FILE" ]; then
+        echo "$FILE is a directory."
+    fi
+else
+    echo "$FILE does not exist"
+    exit 1 
+fi
+
+exit
+
+
+
+
test_file () {
+    # test-file: Evaluate the status of a file
+
+    FILE=~/.bashrc
+
+    if [ -e "$FILE" ]; then
+        if [ -f "$FILE" ]; then
+            echo "$FILE is a regular file."
+        fi
+        if [ -d "$FILE" ]; then
+            echo "$FILE is a directory."
+        fi
+    else
+        echo "$FILE does not exist"
+        return 1
+    fi
+}
+
+
+
+

if / else statements are most frequently used with test

+

test performs a variety of checks and comparisons

+

Its syntax is:

+

test expression

+

or

+

[ expression ]

+
+
+

There are many expressions that are used to evaluate the status of files. Some important File Expressions include:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionIs True If:
-e filefile exists
-d filefile exists and is a directory
-f filefile exists and is a regular file
-r filefile exists and is readable (has readable permissions for the effective user)
s filefile exists and has a length greater than zero
+
+
+

String Expressions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionIs True If:
stringstring is not null
-n stringthe length of string is > than zero
-z stringthe length of string is zero
string1 == string2string1 equals string2
string1 != string2string1 and string2 are not equal
+
+
+

Integer Expressions

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionIs True If:
integer1 -eq integer2integer1 is == to integer2
integer1 -ne integer2integer1 is != equal to integer2
integer1 -le integer2integer1 is <= to integer2
integer1 -lt integer2integer1 is < to integer2
integer1 -ge integer2integer1 is >= to integer2
integer1 -gt integer2integer1 is > to integer2
+
+
+
Breaking Out Of A Loop
+

Bash has two build-in commands that can be used to control program flow inside loops.

+
    +
  • break command immediately terminates a loop and resumes with the next statement following the loop
  • +
  • continue command skips the remainder the loop that is not needed (ie. a condition has been met) and resumes with the next iteration of the loop. continue allows for a more efficient execution
  • +
+
+
+
if condition; then
+    if condition; then
+        commands
+        continue
+    fi
+    if condition; then
+        commands
+        continue
+    fi
+else condition; then
+    command
+fi
+
+

If the first if condition is met, then the second one will be skipped and resumed with the next iteration.

+
+
+
if condition; then
+    if condition; then
+        commands
+        continue
+    fi
+    if condition; then
+        commands
+        break
+    fi
+else condition; then
+    command
+fi
+
+

If the second if condition is met, then the break immediately terminates the loop and resumes with the next statement.

+
+
\ No newline at end of file diff --git a/lessons/week_1/unix_slides.pdf b/lessons/week_1/unix_slides.pdf new file mode 100644 index 00000000..0a0c3c95 Binary files /dev/null and b/lessons/week_1/unix_slides.pdf differ diff --git a/lessons/week_2/git_slides.html b/lessons/week_2/git_slides.html new file mode 100644 index 00000000..de1eec7e --- /dev/null +++ b/lessons/week_2/git_slides.html @@ -0,0 +1,4432 @@ +
+

Version Control and GitHub

+
$ echo "Data Sciences Institute"
+$ echo "Rachael Lam"
+
+
+
+

Prerequisites:

+
    +
  • GitHub account
  • +
+
+
+

Key Texts:

+ +
+
+

References

+
    +
  • Chacon and Straub: Chapter 1
  • +
  • Timbers: Chapter 12.3 - 12.4, 13.3.1
  • +
+
+
+

Version Control

+
+
+
What is Version Control?
+

Version control is a system that records changes to a file or a set of files over time so that we can recall a specific version later. We may already do this by copying files to another directory to save past versions.While it is simple, it lacks flexibility and complexity.

+
+
+

Version Control Systems (VCS) can do a number of things and can be applied on nearly any type of file on our computers:

+
    +
  • revert files to a previous state
  • +
  • revert entire project to a previous state
  • +
  • compare changes over time
  • +
  • see who modified something last
  • +
  • who introduced an issue and when
  • +
  • recover lost files
  • +
+
+
+
Local Version Control Systems
+

Local VCSs were developed to keep track of changes to our files by putting them in a version database.

+
+
+
Centralized Version Control Systems
+

Centralized VCSs (CVCS) were developed to enable collaboration with developers on other systems. CVCSs have a single server that contains all the versioned files.

+
+
+

CVCSs allow some level of transparency to others' work and give Administrators a level of control over what developers can and can't do.

+

Unfortunately, a single server means that if it ever goes down, all collaboration halts for however long that lasts for. Additionally, if backups haven't been kept, work could easily be lost.

+
+
+
Distributed Version Control Systems
+

To handle the limitations of LVCSs and CVCSs, Distributed VCSs were created. This includes Git, Mercurial and Bazaar.

+

Collaborators mirror the entire repsoitory, therefore if a server dies, any one of the collaborators' repositories can be copied back to the server to restore it.

+
+
+

center

+
+
+

Questions?

+
+
+

Git

+
+
+
Git Basics
+

Git thinks of data in a very different way than other VCSs. Instead of storing a set of files and the changes over time, Git thinks of its data more like a set of snapshots of a mini file system.

+

If files have not changed, Git does not store the file again, it links to the previous identical file already stored.

+
+
+

center

+
+
+
Local Operations
+

Most operations on Git only need local files and resources to operate. Git also keeps the entire history of our projects on our local disks meaning we can see changes made months ago without a remote server.

+

We also don't need to be connected to the server to get work done, rather we only need to be connected when we want to upload our work.

+
+
+
Benefits
+

Git uses a check-summing mechanism called SHA-1 hash which is calculated based on the contents of a file or directory structure in Git. It looks somehting like this:

+
24b9da6552252987aa493b52f8696cd6d3b00393
+
+

This checksum means it's impossible to change the contents of any file or directory without Git knowing about it.

+

Git generally only adds data, making it fairly difficult to lose data once we've committed, which we'll learn about later.

+
+
+
The Three States
+

There are three main states that our files can reside in:

+
    +
  • Committed: +
      +
    • data is safely stored on local database
    • +
    +
  • +
  • Modified: +
      +
    • file has been changed but not yet committed
    • +
    +
  • +
  • Staged: +
      +
    • modified file has been marked to go into the next commit
    • +
    +
  • +
+
+
+
The Three Main Sections
+

There are three main sections to a Git project:

+
    +
  • The Git directory
  • +
  • The working directory
  • +
  • The staging area
  • +
+
+
+
The Git Directory
+

The Git directory is where Git stores the metadata and object database for our projects. It is what is copied when we clone a repository from another computer.

+
+
+
The Working Directory
+

The working directory is a single checkout of one version of our projects. These files are pulled out of the compressed database in the Git directory and placed on the disk for us to modify.

+
+
+
The Staging Area
+

The staging area is a simple file that stores information about what will go into our next commit.

+
+
+
Workflow
+

A basic workflow will look something like this:

+
    +
  1. Modify files in our working directory
  2. +
  3. Stage the files in the staging area
  4. +
  5. Commit the changes which takes the files from the staging area and stores them on the Git directory.
  6. +
+
+
+

Questions?

+
+
+

Installing Git

+
+
+

Typically, Git is already installed on our system but we can check for that using the git command:

+
$ git --version
+
+

Does anyone not see a version?

+
+
+
Installing on Linux
+

If you're on Ubantu:

+
$ sudo apt install git
+
+
+

If you're on Fedora, RHEL or CentOS:

+
$ sudo dnf install git
+
+
$ sudo yum install git 
+
+
+
+
Installing on Mac
+

You can install Git via Homebrew, if you have Homebrew installed (https://brew.sh/).

+
$ brew install git
+
+

Finally, you can install Git from source at this link: https://sourceforge.net/projects/git-osx-installer/

+
+
+
Installing on Windows
+

The download will start automatically through this link: https://git-scm.com/download/win

+
+
+

Questions?

+
+
+

Git Setup

+
+
+

The first thing to do now that we have Git installed on our system is to customize it. These changes will remain despite any upgrades to Git that we install.

+

Using the command git config, we can set configuration variables that control all aspects of how Git looks and operates.

+
+
+
Checking Configurations
+

Before we change any of our global configurations, we can check what they are:

+
$ git config --list
+
+

If we haven't configured Git, we can do that now!

+
+
+
Identity
+

First, we'll set our username and email address. Git uses this information everytime we commit.

+
$ git config --global user.name "Rachael Lam"
+$ git config --global user.email "rachael.a.lam@gmail.com"
+
+

The option --global means that we only have to pass this through once.

+
+
+
Editor
+

Next, we'll configure our the default text editor for when Git needs to type in a message. Git uses our system's default editor (usually Vi or Vim) but we can change it if we prefer. If we want to change the editor to emacs, we would do so below:

+
$ git config --global core.editor emacs
+
+
+
+
Diff Tool
+

We can also set the default diff tool which is used to resolve merge conflicts:

+
$ git config --global merge.tool vimdiff
+
+
+
+
Checking the Setting
+

We can use the git config --list command to see all Git settings. See the values of a specific specific setting:

+
$ git config user.name
+
+
+
+
Help
+

If we ever need help, even offline, we can access the manual page three ways:

+
    +
  1. $ git help <verb>
  2. +
  3. $ git <verb> --help
  4. +
  5. $ man git-<verb>
  6. +
+

For example, we can get help for the config command:

+
$ git help config
+
+
+
+

Questions?

+
+
+

Git Basics

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 2
  • +
+
+
+

$ git init / $git clone

+
+
+
Respositories in an Exisiting Directory
+

We're quickly getting into how to start our first Git repository, or commonly known as repo. First we'll learn how to import an existing repo into Git:

+
$ git init
+
+
$ git init -b main
+
+

Here we're creating a new subdirectory named .git that will contain all our necessary repo files. The option -b will create a new branch called main.

+
+
+
Cloning an Existing Respository
+

If we want to collaborate on an existing repo, we need to clone the repo from GitHub. If we don't have a project set up yet, we'll need to do that first.

+
+
+
    +
  1. Create a new project
    +
  2. +
+

center

+
+
+
    +
  1. Add name and optional description
    +
  2. +
+

center

+
+
+
    +
  1. Choose public or private and add initialize
    +
  2. +
+

center

+
+
+

There are a number of automatically generated files such as log files that we might not want Git to add or show as untracked. We can create a file called .gitignore to ignore the automatically generated files.

+

The .gitignore is dependent on the type of coding language you are using but can also be modified to fit specific purposes.

+
+
+

If we created a repo on GitHub, we can choose a .gitignore template. We can select a template specific to the coding language we are using.

+

center

+
+
+

Once we have our repo, we can clone it:

+
$ git clone https://github.com/rachaellam/git-module.git
+
+

Using this code, we've created a repo called git-module (by taking the last part of the link) and initialized a .git directory and pulled all data for that repository while checking for the latest copy.

+
+
+

The url used in the previous code block is copied directly from GitHub by clicking code and copying the HTTPS:

+

center

+
+
+

If we want to change the name of the repo, we can specify that as the next command line option:

+
$ git clone https://github.com/rachaellam/git-module.git mymodule
+
+
+
+

Questions?

+
+
+

Git Commands

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 2
  • +
  • Timbers: Chapter 12.5
  • +
+
+
+

$ git status

+
+
+
Tracked and Untracked Files
+

Files in our working directory can either be tracked or untracked. Tracked files are files that that were in the last snapshot and can be unmodified, modified or staged. Untracked files are files that aren't in our last snapshot or staging area.

+

When we modify a file, Git keeps track of the modifications even before we've decided to commit. We can then stage the modifications and then commit.

+
+
+

center

+
+
+
File Status
+

To better understand what state our files are in, we can check the status:

+
$ git status
+
+

If we've just created our repo, we should see (or something similar):

+
# On branch main
+# Your branch is up to date with 'origin/main'.
+
+# nothing to commit, working tree clean
+
+
+
+

Let's now add a README.md file, because every good repo has a good README.

+
$ touch README.md
+
+

And see the status:

+
$ git status
+
+
+
+
On branch main
+
+No commits yet
+
+Untracked files:
+  (use "git add <file>..." to include in what will be committed)
+	README.md
+
+

Here we can see that we still haven't committed anything and that we have an untracked README.md file. Git also gives us a bit of information including how to add a file to track.

+
+
+

$ git add

+
+
+
Tracking New Files
+

To track new files, or stage new files, we can use git add along with the file that we want to track:

+
$ git add README.md
+
+

We can run git status again to see the results of git add.

+
+
+
On branch main
+
+No commits yet
+
+Changes to be committed:
+  (use "git rm --cached <file>..." to unstage)
+	new file:   README.md
+
+

Now we can see that our README.md file is staged to be committed.

+
+
+

Let's say we add some more info to our README.md file, which has now been tracked. If we run git status, we can know:

+
On branch main
+
+No commits yet
+
+Changes to be committed:
+  (use "git rm --cached <file>..." to unstage)
+	new file:   README.md
+
+Changes not staged for commit:
+  (use "git add <file>..." to update what will be committed)
+  (use "git restore <file>..." to discard changes in working directory)
+	modified:   README.md
+
+
+
+
+

We can stage our additional changes and check the status:

+
$ git add README.md
+$ git status
+
+
On branch main
+
+No commits yet
+
+Changes to be committed:
+  (use "git rm --cached <file>..." to unstage)
+	new file:   README.md
+
+
+
+
+

Let's try adding another file into our directory. It can be something that you've been working on independently, or we can add our project from the previous Unix module.

+
+
+

If we modify many things at once, we can add the option -A to add all files, rather than adding one by one

+
$ git add -A
+
+

A little note about this: it's best to upload your work in small chunks for readability and for collaboration. So if you have a bunch of files, it's recommended to split them into smaller chunks.

+
+
+

Questions?

+
+
+

$ git diff

+
+
+

If we want to see more details of the changes that we've made, we can use the command git diff.

+

git diff compares what is in our working directory to what is in our staging area. If we've made changes to our files without running git add, we'll see the comparison. If there are no differences, nothing will be shown.

+
+
+
diff --git a/README.md b/README.md
+index e69de29..4711fce 100644
+--- a/README.md
++++ b/README.md
+@@ -0,0 +1 @@
++# git-r
+\ No newline at end of file
+
+
+
+
diff --git a/README.md b/README.md
+
+

This is telling us what we're comparing. In this case, it's the difference between a previous version of the README file and the current one

+
+
+
index e69de29..4711fce 100644
+
+

Here is some meta data, or hash identifier that we likely won't need.

+
+
+
--- a/README.md
++++ b/README.md
+
+

This is acting as a legend. Changes from a/README.md are marked by --- and changes from b/README.md are marked by +++

+
+
+
@@ -0,0 +1 @@
++# git-r
+
+

Here we're being told the lines that have changed and what on those lines changed. Because there was nothing removed, this is a bit of a simplistic representation.

+
+
+

We might see something more like...

+
@@ -21,5 +77, 12
+
+

This is telling us 5 lines were removed starting on line 21 and 12 lines were added starting on line 77.

+
+
+
--staged
+

If we want to see the details of what will go into the next commit, we can use git diff with the option --staged

+
+
+

$ git commit

+
+
+

Once we've staged your selected files, it's time to commit the changes. Anything that wasn't staged (any modifications since git add) will not be included in the commit.

+

git commit is most easily run with the option -m. This adds a message to your commit

+
$ git commit -m "adding a message here"
+
+
+
+
-m
+

Messages should be clear. They can also be extremely detailed if needed. By not including the option -m, Git will provide the latest output of git status. If you want even more information, you can use the option -v.

+
+
+

Messages are extremely important for our own records and also when collaborating with others. They can act as a reminder for what our commit includes, and also tell our collaborators what we did last.

+

It's important to commit often as well so that merges are easier to locate and fix.

+

It's also helpful if you want to go back to an earlier version. You have more options to choose from.

+
+
+

Practices around messages can vary but if we want to add a longer message we can remove the -m option.

+
$ git commit
+
+

Then hit i to add a message. You'll see -- INSERT -- at the bottom and you can begin typing your message.

+

When finished, press esc then :wq or :x.

+

w means write and q means quit. x is shorthand for wq

+
+
+
Short (50 chars or less) summary of changes
+
+More detailed explanatory text, if necessary. Wrap it to about
+72 characters or so. In some contexts, the first line is treated 
+as the subject of an email and the rest of the text as the body, 
+the blank line separating thesummary from the body is critical 
+(unless you omit the body entirely).
+
+Further paragraphs come after blank lines.
+
+- Bullet points are okay, too
+
+- Typically a hyphen or asterisk is used for the bullet, preceded
+  by a single space with blank lines in between, but conventions
+  vary here
+
+
+
+
-a
+

If we want to commit all the files we've worked on without putting them in the staging area, we can use the option -a. This will avoid using git add and condense our workflow.

+
$ git commit -a -m "skip staging add message"
+
+

Here we've used two options, -a and -m to skip the staging and add a message.

+
+
+

Questions?

+
+
+

$ git rm

+
+
+

If we delete a file from our working directory after staging it using rm without git, the file will show up in our untracked files. We can then use git rm to stage the file's removal.

+

Let's follow the code below to understand this better:

+
$ touch test.sh
+$ git status
+$ rm test.sh
+$ git status
+
+

Because we haven't tracked the test.sh file so we can remove it and we don't need to tell git to also remove it.

+
+
+

What happens if we add a file to our staging area but then want to delete it? Let's try the two codes below:

+
$ touch test.sh
+$ git add test.sh
+$ git rm test.sh
+
+
$ touch test.sh
+$ git add test.sh
+$ rm test.sh
+$ git rm test.sh
+
+
+
+
-f
+

If we've modified and staged a file, we have to force the removal with the option -f. This is a safety feature so that we don't accidentally delete something.

+
$ touch testfile
+$ git add testfile
+$ git rm -f testfile
+
+
+
+
--cached
+

The option --cashed allows us to remove a file from our staging area without permanently deleting it from our local drive.

+
$ git rm --cached testfile
+
+

We can use wildcards to remove files from our staging area in bulk, although we have to add a backslash in front of * because Git does its own filename expansion.

+
$ git rm -f \*.txt
+
+
+
+

We can also delete files in a folder of our working directory:

+
$ git rm -f dir1/\*.sh
+
+
+
+

$ git mv

+
+
+

Using git mv, we can rename files conveniently and succinctly:

+
$ git mv test.txt test.sh
+
+
+
+

Questions?

+
+
+

$ git log

+
+
+

Sometimes we might want to see a history of our commits or we want to see previous commits after cloning an existing repository. We can do this using the git log command.

+
$ git log
+
+

There are a number of options that help us see even more, or sometimes less, information about each commit.

+
+
+

If we attempt to run a log before any commits have been made, we will get an error:

+
fatal: your current branch 'main' does not have any commits yet
+
+
+
+
-p
+

Adding the option -p will show the diff introduced in each commit. We can also pass a number option that will limit the number of entries shown:

+
$ git log -p -2
+
+

Entries can be any number of entries (-<n>)but is limited to one page of log out puts

+
+
+
--stat
+

The --stat option shows abbreviated stats for each commit:

+
$ git log --stat
+
+
+
+
commit 6c91df668d1899317a643153bd169d37fe05d9f1 (HEAD -> main)
+Author: Rachael Lam <rachael.a.lam@gmail.com>
+Date:   Fri Feb 18 14:56:27 2022 -0500
+
+    first commit
+
+ .gitignore |  4 ++++
+ README.md  |  1 +
+ test.Rproj | 13 +++++++++++++
+ testfile.r |  0
+ 4 files changed, 18 insertions(+)
+
+

+ or -(if there were any) show the number of insertions or deletions. We can also see the date of the commit, who committed and the message.

+
+
+
--pretty
+

The --pretty= option is an interesting feature that enables us to specify the log output when we combine it with format:, creating an extremely useful data extraction feature:

+
$ git log --pretty=format:"%h - %an, %ar : %s"
+
+
+
+
Formatting Options
+ + + + + + + + + + + + + + + + + + + + + + + + + +
OptionDescription
%HCommit hash
%hAbbreviated commit hash
%tAbbreviated tree hash
%pAbbreviated parent hashes
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionDescription
%anAuthor name
%aeAuthor email
%adAuthor date (ex. Thu Dec 2 14:14:55 2021 -0500)
%arAuthor date relative (ex. 26 hours ago)
%cnCommitter name
%sSubject (-m)
+
+
+
--since / --until
+

The options --since= and --until= are more usually more useful than -(n). They produce the logs of any time before (--until) or after (--since) a certain date. You can specify an exact date or relative date:

+
$ git log --since=2.weeks
+
+
$ git log --since="2 days 3 minutes ago"
+
+
$ git log --until="2021-11-20"
+
+
+
+

We can also combine log options to generate specific outputs:

+
$ git log --pretty=format:"%h: %s" --author=Rachael
+
+
$ git log --after="2020-11-01" --since="2020-11-30"
+
+
+
+

Finally, and a favourite for quick glances:

+
$ git log --oneline
+
+
+
+

Questions?

+
+
+

undo undo undo

+
+
+
Changing Commit
+

If we already committed a few files but forgot to add one or made modifications since our commit that we want to add, we can use the option --amend

+
$ git commit -m "initial commit"
+$ git add missed_file
+$ git commit --amend -m "initial commit with missed_file"
+
+

We can still add the -m option to add a new comment.

+
+
+
Unstaging
+

When we want to remove a file from our staging area because we accidentally added one too many files, we can use the code below:

+
$ git reset HEAD README.md
+
+

If we ever forget how to do this, running git status will remind us.

+
+
+
Unmodify
+

We can also revert our files back to the version from our previous commit using git checkout --. It's important to realize that this command essentially rewrites the file so any changes that were made will not be able to be recovered.

+

As well, any commit can usually be recovered but anything that was never committed will most likely be lost forever.

+
$ git checkout -- README.md
+
+
+
+
Select Previous Commit
+

To select a previous commit to revert to, we need the hash of the commit:

+
$ git log
+$ git checkout <HASH> file1
+
+

This can be used forwards or backwards, ie. you can also "revert" to a commit that later than your current version.

+

You can also revert several files at the same time

+
$ git checkout <HASH> file1 file2
+
+
+
+

Questions?

+
+
+

Remote Repositories

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 2
  • +
  • Timbers: Chapter 12.5-12.6
  • +
+
+
+

$ git remote

+
+
+

Remote repos are versions of our projects that are hosted on the internet or some network. This allows us to collaborate with others outside of our local repo.

+

We can see the remote servers we've configured using git remote. If we add the option -v, we can see the URL:

+
$ git remote -v
+
+

Cloned repos will be displayed as origin by default.

+
+
+
Remote Setup
+

Before we connect our local repo to a remote repo, we need to setup our permissions. This is so we can send and retrieve work to and from our remote repositories. There are two ways to do this:

+
    +
  1. +

    Access Tokens

    +
  2. +
  3. +

    SSH

    +
  4. +
+
+
+
Access Tokens
+

               left           right

+
+
+

center

+
+
+
SSH
+
$ ls -al ~/.ssh
+
+

If SSH has not been set up on your computer, you should see something like:

+
ls: cannot access '/c/Users/rachaellam/.ssh': No such file 
+or directory
+
+

Otherwise you'll see filenames id_ed25519 and id_ed25519.pub OR id_rsa and id_rsa.pub which represent your public and private keys.

+
+
+
$ ssh-keygen -t ed25519 -C "rachael.lam@mail.utoronto.ca"
+
+

Use the code above but with your email. This will output:

+
Generating public/private ed25519 key pair.
+Enter file in which to save the key (/c/Users/rachaellam/.ssh/
+id_ed25519):
+
+

Press enter to use the default file.

+
+
+

You will then be prompted to add a passphrase. You cannot reset this passphrase, so be sure to remember it or write it down somewhere safe:

+
Created directory '/c/Users/Vlad Dracula/.ssh'.
+Enter passphrase (empty for no passphrase):
+
+

It will then ask you to reenter the passphrase:

+
Enter same passphrase again:
+
+
+
+

You will then get a confirmation with a random piece of art at the end. It will show the private key (identification) which you should never share, the public key and the key fingerprint which is a shorter version of the public key.

+
Your identification has been saved in /c/Users/rachaellam/.ssh/
+id_ed25519
+Your public key has been saved in /c/Users/rachaellam/.ssh/
+id_ed25519.pub
+The key fingerprint is:
+SHA256:SMSPIStNyA00KPxuYu94KpZgRAYjgt9g4BA4kFy3g1o
+rachael.lam@mail.utoronto.ca
+
+
+
+

Now we can check that we have the public and private key files:

+
$ ls -al ~/.ssh
+
+
+
+

It's time to give GitHub our public key so let's read the public key file and copy it:

+
$ cat ~/.ssh/id_ed25519.pub
+
+

Output:

+
ssh-ed25519 AAAAC3NzaC1lZDI1NPN7AAAAIDmRA3d51X0uu9wXek559gfn6UFNF
+69yZjChyBIU2qKI rachael.lam@mail.utoronto.ca
+
+

Copy the long public key to add to GitHub.

+
+
+
Settings --> SSH and GPG keys --> New SSH key
+

Add a title like rachael's key and paste the public key then click Add SSH key.

+

Finally, we can check that it's been authenticated:

+
$ ssh -T git@github.com
+
+
+
+
remote add
+

To add a remote repo, we can use git remote add followed by the name and URL. Now we can connect our local repo to a remote repo:

+
$ git remote add origin https://github.com/rachaellam/git-r.git
+$ git remote -v
+
+

After checking we'll see:

+
origin  https://github.com/rachaellam/git-r.git (fetch)
+origin  https://github.com/rachaellam/git-r.git (push)
+
+
+
+

If we want to see more information about a remote repo, we can use the command:

+
$ git remote show origin
+
+

Here we can see the URL that we're fetching and pulling from, our remote branches, and configurations for git push (to the main branch or another).

+
+
+

To send and retrieve work between our local and remote repositories, we have to authenticate a personal access token:

+

               left           right

+
+
+

center

+
+
+

Questions?

+
+
+

$ git fetch / $ git push

+
+
+

When collaborating with others, changes might be made that are important to copy to your local directory. git fetch will get any new changes but it won't merge it to our work or modify our work.

+
$ git fetch origin
+
+
+
+

git pull will automatically fetch and merge a remote branch to our current branch (more on branching later). It's a good practice to pull before every work session, especially when working with others. Otherwise, a collaborator might have made changes, and you won't be able to push your changes to GitHub.

+
$ git pull
+
+
+
+

If we've create our remote repository using init and remote add, we need to specify the remote that we want to pull to and the branch we want to pull from.

+
$ git pull origin main
+
+

origin being the name of the remote repo we created earlier and main being the main branch on our GitHub repo.

+
+
+

Questions?

+
+
+

$ git push

+
+
+

When we're ready to share our modifications, we have to push our project and files upstream using git push

+
$ git push origin main
+
+

Here we're pushing to our origin server on your main branch. The main branch is sometimes called the master branch.

+

This command only works if we have write access and if no collaborator is pushing upstream at the same time as we are. We'd have to instead pull and merge their work before pushing our own.

+
+
+

Questions?

+
+
+

Git Branching

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 3
  • +
  • Timbers: Chapter 12.8
  • +
+
+
+

Branching allows us to diverge from the main line to do work without accidentally messing with the main line. This helps with testing without making any accidental changes to the working branch.

+

To understand how branching works, let's go back and understand how Git saves files.

+
    +
  • blob
  • +
  • tree
  • +
  • pointer
  • +
+
+
+

center

+
+
+

A branch is a way to move different pointers to a specific commit. In Git, the default branch is named master or main. When we first start making commits, we start at the master branch that automatically points to the last commit made.

+

center

+
+
+

$ git branch

+
+
+

We can make a new branch which creates a new pointer for us to move around. We can do this by using the command git branch:

+
$ git branch testing
+
+
+
+

Here, we've created a branch called testing, which means we've created a new pointer that could point to our current commit.

+

center

+
+
+

$ git checkout

+
+
+

Git tracks what branch we're on using a pointer called HEAD. If we move the HEAD to the branch main, we'll see:

+
Already on 'main'
+
+

To move HEAD to point to the testing branch that we just created, we use git checkout:

+
$ git checkout testing
+
+

and we should see..

+
Switched to branch 'testing'
+
+
+
+

center

+
+
+

If we make some changes to our testing branch and commit, our head will move with the new commit.

+

center

+
+
+

If we want to go back to an older version of our project and make changes, we can use git checkout again to redirect the head back to our master branch:

+
$ git checkout main
+
+

Using this command will move the HEAD pointer back to our master branch and revert our files in our working directory back to the snapshot that the master branch points to.

+
+
+

Questions?

+
+
+

Branching and Merging

+
+
+

Let's take a look at a workflow that you might encounter:

+
$ git commit -m "commits to master branch"
+
+

center

+
+
+
$ git checkout -b iss53
+
+

center

+
+
+
$ git commit -a -m "commits to iss53"
+
+

center

+
+
+
$ git checkout master
+$ git checkout -b 'hotfix'
+$ git commit -m "commits to hotfix"
+
+

center

+
+
+
$ git checkout master
+$ git merge hotfix
+
+

center

+
+
+

$ git merge

+
+
+

In the last step we saw a command called git merge. Once we've committed changes and are ready to deploy, we can use git merge to merge our working branch back into our master branch.

+
$ git merge testing 
+
+
+
+

center

+
+
+

We can then delete the branch that we've created, as the master branch points to the same place.

+

Adding the option -d will delete the branch that had been merged with the main, as we no longer need it.

+
$ git branch -d testing
+
+
+
+

Remember that changes to our master branch have not been added to our iss53 branch. We either need to pull them in or wait to integrate them when we pull iss53 into the master branch

+

center

+
+
+

If we're merging a branch with the main that has been changed since we diverged, merging isn't as simple for Git.

+

Git will create a new snapshot of the merge and automatically create a new commit that points to it, called a merge commit.

+

center

+
+
+

We saw git branch earlier with the option -d to delete a branch, but to get a list of our current branches, we can run git branch without any arguments.

+
$ git branch
+
+

The * indicates the branch we are currently on or have checked out (git checkout)

+
+
+

If we run git branch with the option -v, we can see the last commit on each branch. This is another reason why comments are so important to add to our commits: they can be extremely useful when looking back at our work and seeing what we've done.

+
+
+

We can also add the options --merged or --no-merged to git branch. --merged allows us to see what branches been merged to the branch we're currently on. Branches without the * are generally safe to delete because we've already merged our work with our main branch.

+
$ git branch --merged
+
+
+
+

On the other hand, --no-merged allows us to see all the branches that haven't been merged.

+
$ git branch --no-merged
+
+

If we try to delete one of these branches, we will receive an error. We can force delete using the option -D.

+
+
+

Merge Conflicts

+
+
+

Often times, merging our work with other topic branches or the main branch creates errors.

+

For example, if we've changed the same part of the same file differently in the two branches we're merging, we will encounter a conflict.

+

Luckily, Git helps us see where the error is to correct it.

+
+
+
+

Git shows us the beginning of the merge conflict with
+<<<<<<< HEAD and the end with >>>>>>>.
+

+

======= separates the differences.
+

+

To fix the merge, you can choose one set of changes, the difference you prefer or re-write it entirely. You have to remove all identifiers of the merge conflict as well.

+
+
+

Questions?

+
+
+

Branching Workflow

+
+
+
Long-Running Branches
+

Multiple long running branches are helpful when tackling large and complex projects.

+

Typically, developers will keep the master branch as the stable branch or code that has been or will be released. They will then have parallel branches that are used for development and testing.

+

Braches can also have various levels of stability, and will graduate/merge branches once they're fully tested.

+
+
+

center

+
+
+
Topic Branches
+

Topic branches are short-lived branches that are created for a particular feature or related work. They allow us to quickly switch between topics and keep changes there for as long or as little as needed, regardless of the created or modified order, before merging.

+
+
+

left right

+
+
+

Questions?

+
+
+

Remote Branches

+
+
+

Remote branches are pointers to the state of branches on our remote repositories. Our remote repositories can have multiple remote branches, just as we can have multiple braches on our local repositories.

+

The format is (remote)/(branch) or (remote) (branch)

+

If branches already exist on your GitHub repo, you will have access to these branches. If we're working with a branch that does not exist yet, we can push it to our remote repo.

+
+
+
Pushing
+

When we're ready to share our work, we'll use git push. If the remote branch already exists, we can push directly to that branch:

+
$ git checkout testing
+$ git add -A
+$ git commit -m "testing branch commit"
+$ git push origin testing
+
+

This will push our changes to the existing testing branch on GitHub.

+
+
+

If we were working with a branch that only exists locally, we can push it to GitHub with a slight tweak:

+
$ git checkout new-branch
+$ git add -A
+$ git commit -m "new branch commit"
+$ git push origin main:new-branch
+
+

This will create a new branch on GitHub called new-branch. From here, if we want to continue updating this branch, we can just run git push origin new-branch.

+
+
+
Fetching
+

When we fetch or pull files from our remote repos, we don't automatically have access to local, editable copies of files of the remote branches.

+

We can do this in several steps. First we're going go fetch the remote branches:

+
$ git fetch
+
+
+
+

We can then see what branches exist remotely:

+
$ git branch -v -a
+
+

And we'll see something like this:

+
* main                        3d850f2 a commit
+  remotes/origin/HEAD         -> origin/main
+  remotes/origin/main         3d850f2 another commit
+  remotes/origin/testing      3d850f2 another committ
+
+
+
+

Then we'll create a branch that exists on our local drive:

+
git checkout -b testing origin/testing
+
+

Here we're pointing the HEAD to the new branch (-b) called testing from origin/testing

+
+
+
Tracking Branches
+

Tracking branches are branches that have a direct relationship with a remote branch. We can push and pull to and from these branches, as Git automatically knows which server and branch we're working with.

+

For this to work, the name of your local branch must be the same as the remote branch

+
+
+

If the branches are named differently, we must run a different command for the push to be successful:

+
$ git push origin HEAD:remote-branch
+
+
+
+
Deleting Branches
+

If we've merged all our changes into our main branch, we can delete the remote branch with the following code:

+
$ git push origin :testing
+
+
+
+

Questions?

+
+
+

Collaborating

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 3 + 5
  • +
  • Timbers: Chapter 12.8
  • +
+
+
+

Much of the work that we do will involve working with others. It's important that we learn how best do this so we can successfully collaborate and avoid conflicts where possible. If conflicts arise, good collaboration practices help us resolve them with ease.

+

So far we've learned several practices and commands that help us collaborate with others, including remote repositories and branches, git pull git push and git merge but we'll learn more practices that make collaboration straightforward.

+
+
+

There are many different factors that influence what workflow you might follow and how you might contribute to a project including:

+

1. Active contributor size
+Teams can vary from a few collaborators to thousands, varying the number of commits per day.

+

2. Chosen workflow
+Each project could have a different process to check patches including an integration manager or peer reviews.

+

3. Commit access
+Policies regarding how to contribute work can differ between projects, even by how much work or how often.

+
+
+

Let's take a look at a couple possible workflows:

+

center

+
+
+

center

+
+
+

GitHub

+
+
+
Adding Collaborators
+

To collaborate with others on our GitHub repo, we can add collaborators so they have direct access to the repo:

+

center

+
+
+

center

+
+
+

center

+
+
+

Access does not have to be permanent. We can remove collaborators at any time and add additional ones when needed.

+

Granting access to your repo this way, enables collaborators to make changes and push them to the repo without our constant permission. If we do not add push access, collaborators have to fork the repo and create pull request.

+
+
+
Forking Projects
+

Forking allows us to collaborate on projects without push access. We can fork a public project on GitHub and then clone it into our local server to begin making changes.

+

center

+
+
+

Once a project has been forked, we can find the repo in our GitHub repositories. We can then clone the repo (git clone), make changes and push our changes without altering the original repo.

+

Alternatively, we can clone the original repo, make our changes, fork the original repo and then merge our branch to the master branch of the forked repo.

+

If we're collaborating with someone and we want our changes to be merged to the original repo, we can create a pull request.

+
+
+
Pull Request
+

After making a few changes, we now want to create a pull request to merge our changes with the original repo. We can do this directly in GitHub:

+

center

+
+
+

To the pull request, we can see what branches and repos we're attempting to merge:

+

center

+
+
+

We can also see the changes that were made:

+

center

+
+
+

GitHub will also check to make sure that there are no conflicts with the base branch:

+

center

+
+
+

Pull requests with no merge conflicts are easy to merge into the branches but it gets more complicated if there are merge conflicts:

+

center

+
+
+

You can still create a pull request with merge conflicts:

+

center

+
+
+

center

+
+
+

To resolve conflicts, it's very similar to merging conflicts through terminal:

+

center

+

Because resolving conflicts is done on GitHub, it's a good practice to resove conflicts before creating a pull request.

+
+
+

Questions?

+
+
+

Conflicts

+
+
+

References

+
    +
  • Chacon and Straub: Chapter 3 + 6
  • +
  • Timbers: Chapter 12.5
  • +
+
+
+

Conflicts are going to arise at some point, especially when working with others. It's important that we learn how to handle these conflicts for easier and more successful collaboration.

+
+
+

GitHub Issues

+
+
+

GitHub issues are an extremely useful tool for communicating decisions, ideas and problems that are project specific.

+

They are an alternative to email or Slack that keep communication isolated to a particular project.

+

Issues can be opened on GitHub and even when they're closed, they remain available. They're also accessible to all collaborators for transperancy.

+
+
+

To open an issue, navigate to the project page and click Issues:

+

center

+
+
+

Then open a new issue:

+

center

+
+
+

From here, we can add a title and description of the issue, and add any specific collaborators, labels, etc.

+

center

+
+
+
Information
+

Title: should be descriptive and quickly convey what the issue is about

+

Description: explain the purpose of the issue and how to potentially resolve it. If it's a bug fix, include a reprex, what you wanted to happen and what actually happen. You can also include steps already taken to solve the issue.

+
+
+
Reprex
+
    +
  • +

    A reprex is a REPRoducible EXample.

    +
  • +
  • +

    It contains just enough of the code to reproduce the error, ie. it is self-contained

    +
  • +
  • +

    We might have to create a smaller version of the code in order to create the reprex. Don't include anything that isn't related to the problem.

    +
  • +
  • +

    Sometimes, this process will help us solve our issue.

    +
  • +
+
+
+
Inclusions
+

A minimal dataset to demonstrate the problem. This could be a regularly used one such as iris

+
install.packages("dyplr")
+library(dplyr)
+head(mtcars)
+
+

or one easily built yourself.

+
df <- data.frame (col1  = c(1, 2),
+                  col2 = c(3, 4))
+df
+
+
+
+
    +
  • +

    Make sure to include classes that are necessary to your reprex (ex. dates, factors, etc.)

    +
  • +
  • +

    If you're using randomly sampled data, set the seed to so the same data is produced each time.

    +
  • +
+
set.seed(853)
+
+
+
+

Include all packages that you need.
+

+
    +
  • Make sure they are placed at the top of the script so it's quick and easy to see what is necessary for the reprex.
  • +
+
+
+
Other Inclusions
+
    +
  • +

    Details about the issues you are facing.

    +
  • +
  • +

    Comments that will add clarification to your error.

    +
  • +
  • +

    Add what fixes have been attempted. This could include pages to StackOverflow articles that you've viewed.

    +
  • +
  • +

    Communicate cleary what you're desired outcome is.

    +
  • +
+
+
+
Task Lists
+

If an issue is quite large, it's possible to add tasks lists to break the issue into smaller pieces.

+
    +
  • +

    Use square brackets - [ ]

    +
  • +
  • +

    To mark it complete, use - [x]

    +
  • +
  • +

    Issues can be linked to previous issues using

    +
      +
    • the number - [x] #11
    • +
    • a URL - [x] https://github.com/rachaellam/git-r/issues/11
    • +
    +
  • +
+
+
+

Once an issue has been opened, we can respond and comment.

+

When we decide it has been resolved, we can close the issue. The history of the issues can still be seen, even if it has been closed.

+
+
+

Questions?

+
+
+

Debugging

+
+
+
File Annotation
+

File annotation can help us resolve issues in our code if we know where thie issue is. We can see when the code was introduced and by whom, line by line, using the aptly named git blame.

+
$ git blame -L 1,3 script.sh
+^8e9b89da (Rachael Lam  2021-12-02 15:01:02 -0500  1) #line 1
+8e9b89da (Rachael Lam   2021-12-02 15:01:02 -0500  2) #line 2
+8e9b89da (Rachael Lam   2021-12-02 15:01:02 -0500  3) #line 3 
+
+
+
+

git blame is combined with the filename we want to inspect. We can also use the option -L followed by two numbers to limit the number of lines shown.

+

We can then see the partial SHA-1 of the commit that last modified the line, the author name and date of the commit, and the content of the file by line.

+

When the SHA-1 is preceeded by a ^, it indicates that those commits were when the file was first added to the project and have not changed since.

+
+
+
Binary Search
+

If we don't know where the issue is, we can use git bisect to get identify the commit that introduced an issue.

+
$ git bisect start
+$ git bisect bad
+$ git bisect good [good_commit]
+
+

First, we've started the bisect program. We then told the system that the current commit is broken using bisect bad followed by the last good commit using bisect good [good_commit]. We can see the different commit if we run git log that we learned earlier.

+
+
+

Git produced the number of commits that were between the good and the bad commit and then checked out the middle one.

+

From here, we can run our test to see if the issue still exists. If it does, it means the issue was introduced in a commit before this middle commit and we can run git bisect bad to tell the system that there is still an issue.

+

If it does not, then the issue was introduced after and we can run git bisect good.

+
+
+

We can keep running this loop until we find the commit that introduced an issue and make our corrections.

+

When we're finished, we can run git bisect reset to reset our HEAD to where we were before we started.

+
+
+

Best Practices

+
+
+
    +
  • Topic branches should be used to try out new code before integrating. They enable us to play around or leave for the time being it if it's not working.
  • +
  • Commit often rather than submitting a massive commit. This makes it easier to review and merge changes, or revert if necessary.
  • +
+
+
+
    +
  • Create quality commit messages so that your collaborators can easily understand what has been done. For example:
  • +
+
Short (50 chars or less) summary of changes
+
+More detailed explanatory text, if necessary. Wrap it to about
+72 characters or so. In some contexts, the first line is treated 
+as the subject of an email and the rest of the text as the body, 
+the blank line separating thesummary from the body is critical 
+(unless you omit the body entirely).
+
+Further paragraphs come after blank lines.
+
+- Bullet points are okay, too
+
+- Typically a hyphen or asterisk is used for the bullet, preceded
+  by a single space with blank lines in between, but conventions
+  vary here
+
+
+
+

Questions?

+
+
+

What is reproducibility?

+
+
+
    +
  • +

    Reproducibility is the ability for for independent researches to obtain the same or similar results when repeating an experiment or test.

    +
  • +
  • +

    This concept has been widely used in natural sciences, but is not yet as popular in data science.

    +
  • +
  • +

    Remember, data science is a science. We question, hypothesize, test, and therefore, we should also have the same rigour of confirmation.

    +
  • +
+
+
+
    +
  • +

    Skepticism should always be able to be independently verified. We should be able to defend our results and decisions.

    +
  • +
  • +

    Who would believe your results otherwise? More importantly, you should not believe results if they cannot be verified.

    +
  • +
+
+
+

Why is reproducibility important?

+
+
+
    +
  1. +

    New Insights

    +
  2. +
  3. +

    Reduce Error Risks

    +
  4. +
  5. +

    Validate Results

    +
  6. +
  7. +

    Transparency

    +
  8. +
+
+
+

How can we make our work reproducible?

+
+
+

There are a number of practices that can help make our work reproducible including:

+
    +
  • Commenting Code
  • +
  • Technical Documentation
  • +
  • Folder Structure
  • +
+
+
+

Commenting Code

+
+
+

How does commenting code help in reproducibility?

+
+
+

Commenting code is an important practice that benefits both ourselves and collaborators.

+

Not only can we understand what we did to fix our own errors or improve our work, but others can better understand our code to reproduce it.

+
+
+

Ellen Spertus outlines 9 rules to follow:
+

+
    +
  1. +

    Comments should not duplicate the code

    +
  2. +
  3. +

    Good comments do not excuse unclear code

    +
  4. +
  5. +

    If you can’t write a clear comment, there may be a problem with the code

    +
  6. +
  7. +

    Comments should dispel confusion, not cause it

    +
  8. +
+
+
+
    +
  1. +

    Explain unidiomatic code in comments

    +
  2. +
  3. +

    Provide links to the original source of copied code

    +
  4. +
  5. +

    Include links to external references where they will be most helpful

    +
  6. +
  7. +

    Add comments when fixing bugs

    +
  8. +
  9. +

    Use comments to mark incomplete implementations

    +
  10. +
+
+
+
1. Comments should not duplicate the code
+
    +
  • Comments should add value to whoever is reading your code.
  • +
  • Duplicating code adds unneccesary bulk and can actually make it more difficult to understand the code.
    +
  • +
+

Can you think of a bad example?

+
+
+

Here is an example of what you should not do:

+
x=5
+
+if [ $x = 5 ]; then
+    echo "x equals 5." # if x = 5 then ouput x equals 5
+
+else
+    echo "x does not equal 5." # otherwise output x does not equal 5
+
+fi
+
+
+
+
2. Good comments do not excuse unclear code
+
    +
  • Our aim should always be having clear code, rather than relying on our comments to add clarity.
  • +
  • Remember, we should not be adding more bulk to the code that makes it more difficult to understand.
  • +
+
+
+
3. If you can’t write a clear comment, there may be a problem with the code
+
+

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to
+debug it.

+
+

- Kernighan's Law

+
+
+
4. Comments should dispel confusion, not cause it
+
    +
  • If our comments are adding further confusion, we should either rewrite the comment or remove it entirely.
  • +
  • A could comment should always be written with the intent to help better understand what is being done.
  • +
+
+
+
5. Explain unidiomatic code in comments
+
    +
  • If we've purposefully written code that others may find unecessary, we need to comment our reasoning.
  • +
  • Others may try to simplify our code if we don't explain our reasoning.
    +
  • +
+

Can you think of an example?

+
+
+
6. Provide links to the original source of copied code
+
    +
  • Often times, we'll use code that others have written. It's important to give credit to the original source, but as well as give us a reminder as to where we got the code to reference it later if we need.
  • +
  • Referencing the source can also provide other information such as what the problem was, why the solution was recommended and how it can be improved. It also means, we don't have to comment all of these details again in our own code.
  • +
+
+
+

An example:

+
# I got these 9 rules from Ellen Spertus' blog post on
+# StackOverflow: https://stackoverflow.blog/2021/12/23/
+# best-practices-for-writing-code-comments/
+
+
    +
  • It's best to include the URL so other's don't have to search for the exact location.
  • +
  • Remember: never copy code that you don't personally understand.
  • +
  • Code from StackOverflow falls under Create Commons licenses so a reference comment is needed.
  • +
+
+
+
7. Include links to external references where they will be most helpful
+
    +
  • References don't just have to be used for copied code. They can also provide information on decisions made or changes in practices
  • +
+
+
+
8. Add comments when fixing bugs
+
    +
  • Comments can help others understand what we modified, if the modification is still needed, and how to test our modifications
  • +
  • Although git blame can be used to find the commit that modified the code, a good comment can help locate the change and are quite brief.
  • +
+
+
+
9. Use comments to mark incomplete implementations
+
    +
  • Sometimes we have limitations in our knowledge or time. Adding code documenting these limitations can allow us to better address and fix the issues.
  • +
+
+
+
Some other good practices:
+
    +
  • Comments should be clear and efficient. Don't add more information than necessary, but don't be too vague
  • +
  • Remember to update your comments if you update your code. Old comments can add more confusion.
  • +
  • Inline comments can add noise as they're mixed with our code. Spacing can be helpful here:
  • +
+
colors = [[213/255,94/255,0],         # vermillion
+          [86/255,180/255,233/255],   # sky blue
+          [230/255,159/255,0],        # orange
+          [204/255,121/255,167/255]]  # reddish purple
+
+
+
+
+

Code tells you how, comments tell you why.

+
+

- Jeff Atwood, Co-founder of StackOverflow

+
+
+

Technical Documentation

+

Writing

+
+
+

What is technical documentation writing?

+
+
+

Why is it important to write a good technical documentation?

+
+
+

Technical documents are necessary for reproducibility as they relay important information about your project to others. Writing technical documents is not easy but should not be overlooked.

+

A well done technical document will communicate the goals of a project and in doing so, can generate interest in the project.

+
+
+

GitHub outlines several pieces of information to include:

+
    +
  1. What the project does
  2. +
  3. Why the project is useful
  4. +
  5. How users can get started with the project
  6. +
  7. Where users can get help with the project
  8. +
  9. Who maintains and contributes to the project
    +
  10. +
+

This is just part of the story and we'll add more to this in the coming slides.

+
+
+
README
+
    +
  • Technical documentation writing is typically found in a README.md file.
  • +
  • If the README.md file is placed in our repo's root, doc folder, or hidden in the .github directory, GitHub will place the contents of the README.md on the main repo page.
  • +
  • The README.md file will be the first thing visitors see when they come to the project page so it's important to make it as appealing as possible.
  • +
+
+
+
Examples
+

Let's walk through some good examples of README.md files:

+ +
+
+

What did you like about these README files?

+

What similarities can you see?

+
+
+
What should be included?
+
    +
  1. Name of the project
  2. +
  3. What the project does
  4. +
  5. The project's usages
  6. +
  7. How to get started
  8. +
  9. Where to find help
  10. +
  11. Who contributes
  12. +
+
+
+
1. Name of the Project
+
    +
  • The name of your project should be unambiguous.
  • +
+
+
+
2. What the project does
+
    +
  • This should be a description of the project.
  • +
  • Provide context to the project and any reference links.
  • +
  • Include features or background information
  • +
  • Can be titled "Description"
  • +
+
+
+
3. The project's usages
+
    +
  • This should include how the project can be used.
  • +
  • Provide examples using the code along with the expected output of said code.
  • +
  • It should be a smaller example. Longer examples can be linked to.
  • +
  • Can be titled "Usages"
  • +
+
+
+
4. How to get started
+
    +
  • This is the installation guide.
  • +
  • Think of your particular audience and how much detail you might need to include.
  • +
  • Add a requirements section if there are specific dependencies or needs to run in a particular programming language.
  • +
  • Can be titled "Installation"
  • +
+
+
+
5. Where to find help
+
    +
  • Direct people on where to find help if they need.
  • +
  • This could be the issues page on GitHub, a forum, or an email address.
  • +
  • Can be titled "Support"
  • +
+
+
+
6. Who contributes
+
    +
  • This should outline how others can contribute to your project and what your requirements are for accepting contributions.
  • +
  • Can be titled "Contributing"
  • +
+
+
+
Additional Additions
+
    +
  • Visuals: Visuals can grab people's attention, but they can also be helpful for showcasing what the code does. Include screenshots or GIFs that demonstrate your project.
  • +
  • Badges: Badges provide metadata such as issue tracking, test results and downloads. Shields.io provides this service and you can also look at their GitHub for more information.
  • +
  • Acknowledgements: Include the authors or anyone that helped with the project.
  • +
+
+
+
Markdown
+
    +
  • As noted by the extension, README.md files are usually written in markdown, thus using markdown syntax for styling.
  • +
  • GitHub provides a good reference on how to write your README in markdown.
  • +
+
+
+
Headings
+
# Largest Heading
+## Second Largest Heading
+### Third Largest Heading
+
+

center

+
+
+
Text Styling
+
**bold**
+*italic*
+~~strikethrough~~
+**this is a *nested* example**
+***bold and italic***
+
+

center

+
+
+
Quoting
+
> Block quote some text
+
+

center

+
+
+
Unordered Lists
+
- this is an unordered list
+- second item
+    - nested
+        - second nest
+
+

center

+
+
+
Ordered Lists
+
1. This is an ordered list
+2. This is the second item
+    - with some additional information
+3. This is the third
+
+

W:1000 center

+
+
+
Codeblock
+

Wrap your code in ``` to create a codeblock.

+

W:1000 center

+
+
+
Links
+
[Rachael's GitHub](https://github.com/rachaellam)
+
+

W:1000 center

+
+
+
Images
+
![w:1000 center](pics/picture.png)
+
+

center
+As we see, images can also be GIFs. We can also play around with the size and alignment.

+
+
+

Folder Structure

+
+
+

What is folder structure and why is important?

+
+
+

A good folder structure is important for reproducibility because it easily allows for others to navigate and implement our projects. If someone references a file that is self contained, they know they won't have to change the file path to gain access.

+

For example, what is the difference between these two paths:

+
    +
  1. +

    "/Users/rachaellam/Documents/all-projects/this-project/data/"

    +
  2. +
  3. +

    "this-project/data/"

    +
  4. +
+
+
+

Folder structure can vary based on the project but a basic one to follow is...

+
    +
  • /inputs +
      +
    • Everything that will not be edited including raw data and references
    • +
    +
  • +
  • /outputs +
      +
    • Everything that was created during the project and your results
    • +
    +
  • +
  • /scripts +
      +
    • All code that was written for the project
    • +
    +
  • +
+
+
+

Wilson et. al also outline a file structure that is similar...

+
    +
  • /doc +
      +
    • All text documents including documentation or references
    • +
    +
  • +
  • /data +
      +
    • All raw data and metadata
    • +
    +
  • +
  • /results +
      +
    • Files generated during the analysis including generated data or cleaned data
    • +
    • Results can be further divided into subdirectories that contain intermediate files and finished files
    • +
    +
  • +
  • /src +
      +
    • All code that was written for the project
    • +
    +
  • +
+
+
+

References

+

Reproducibility:

+ +
+
+

Commenting:

+ +
+
+

Technical Documentation Writing:

+ +
+
+

Folder Structure:

+ +
+
\ No newline at end of file diff --git a/lessons/week_2/git_slides.pdf b/lessons/week_2/git_slides.pdf new file mode 100644 index 00000000..0e23425e Binary files /dev/null and b/lessons/week_2/git_slides.pdf differ diff --git a/post-course/Exit Survey - DS Foundations.docx b/post-course/Exit Survey - DS Foundations.docx new file mode 100644 index 00000000..015aec41 Binary files /dev/null and b/post-course/Exit Survey - DS Foundations.docx differ diff --git a/post-course/Exit Survey - DS Foundations.md b/post-course/Exit Survey - DS Foundations.md new file mode 100644 index 00000000..0a2cfd5c --- /dev/null +++ b/post-course/Exit Survey - DS Foundations.md @@ -0,0 +1,49 @@ +# DSI Upskilling Pilot Course Exit Survey +## Course Name: Data Science Foudations +### _Course Instructor: Rachael Lam_ +### _Ta: Delaram Pouyabahar_ +\ +Thank you for joining the DSI upskilling pilot courses! + +We would like to get your thoughts on your experiences with this course and how we can design our full course offerings. We would appreciate it if you could take the time to fill out and submit this short survey. + +### Scale questions: +- 1 - Not at all +- 2 - Somewhat +- 3 - Moderately +- 4 - Mostly +- 5- A great deal + +## 1. About the Curriculum. +- I found the course intellectually stimulating. +- The course provided me with a deeper understanding of Unix shell, version control, and GitHub. +- The course is set up to fully onboard someone without prior technical experience. +- The course design, including live coding and examples, provided an opportunity for me to demonstrate an understanding of data science skills. +- The course inspired me to think further about the subject matter outside of class. +- The course material is helpful for me to enhance my data science skills for my career. +- I would recommend this course to other students. +- Overall, the quality of my learning experience in this course was good. + +## 2. About the Instructor. +- The course instructor (Rachael Lam) explained concepts clearly. +- The course instructor (Rachael Lam) encourages learners to ask questions about the course material. + +## 3. About the TA. +- The TA (Delaram Pouyabahar) was readily available during the class. +- The TA (Delaram Pouyabahar) was helpful when I had difficulties or questions. + +### Short Answer Questions. + +#### 4. How would you rate the pilot course sequence and flow? + +##### 4.1 How do you feel about the course materials? Was there too much material? Would you prefer less material but more in-depth? Or did you enjoy how high-level the material was? + +#### 5. Please comment on the in-class support model. + +#### 6. What were the top 2 things you liked about this pilot course. + +#### 7. What were 2 things you do NOT like about this pilot course. + +#### 8. Please tell us about other data science topics that would of interest and helpful in your career. + +### Thank you so much for your feedback! diff --git a/slides-resources/ethics_slides.md b/slides-resources/ethics_slides.md new file mode 100644 index 00000000..122bdf6b --- /dev/null +++ b/slides-resources/ethics_slides.md @@ -0,0 +1,166 @@ +--- +marp: yes +theme: uncover +_class: invert +paginate: yes +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; +output: pdf_document +--- + + + + +# **Ethics** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- + + +##### Why should we care about ethics in Data Science? + +--- + + +## `Who Counts in our` +## `Datasets?` + +--- + +![w:950 center](pics/torontocrime.png) + +--- + +![w:950 center](pics/covidcases.png) + +--- + +![w:2300 center](pics/drivingtrends.png) +- [Kieran Healy](https://kieranhealy.org/blog/archives/2020/05/21/the-kitchen-counter-observatory/) + +--- + +![w:750 center](pics/census.png) + +--- + +##### **Os Keyes: Counting the Countless** +>Trans lives are ultimately (to a certain degree) about autonomy: about the freedom to set one’s own path. Society isn’t a tremendous fan of this + +--- +> ...“administrative violence” to refer to the way that administrative systems such as the law — run by the state, that white supremacist capitalist patriarchy — “create narrow categories of gender and force people into them in order to get their basic needs met.” + +--- +> “data violence” refers to the perpetuation of violence through datalogical systems: everything from YouTube’s recommender algorithm to facial recognition to online advertising. + +--- + + + +How do data science practices exclude people or communities? + +--- + +- Normalizing and Standardizing + - Open text boxes create data that is difficult to clean + + - A quantitative approach forces us to make buckets and decide the definitions of gender and constrains people's decisions to those buckets +- Consistency + - Big data encourages us to collect as much data as possible in a standardized way + + - Developing a data history means that variation from the previously determined standardized approach complicates data collection thus is discouraged + +--- + + + +Insurance companies are making decisions based on health data (ex. food you buy, exercise, etc.) +
+ +How might this practice exclude people or communities? + +Is this practice even ethically reponsible? + +--- + +> "The inhumane reduction of humanity down to what can be counted." + +--- + +##### **Kieran Healy: The Kitchen Counter Observatory** +> Numbers and measures are crude; they pick up the wrong things; they strip out the meaning of what’s happening to real people; they make it easy to ignore what can’t be counted. There’s something to those complaints. But it’s mostly a lazy critique. In practice, I find that far from distancing you from questions of meaning, quantitative data forces you to confront them. + +--- + + + +Do you think the reductionist appoach of data science make it easy to ignore realities, or does it force you to confront them as Healy states? + +--- + +##### **Reforming Data Science** +>With administrative violence, Spade notes how “reform” often benefits only the least marginalized while legitimizing the system and giving cover for it to continue its violence. + +We can see this in facial recognition where people of colour are not well recognized by the program, yet creating a better algorithm only benefits systems of control. + +--- + + + +Can you think of another product/algorithm/program that further marginalizes communities? + +Do you believe we can reform data science? + +--- +##### **References** +- Healy, 2020, ‘The Kitchen Counter Observatory’, https://kieranhealy.org/blog/archives/2020/05/21/the-kitchen-counter-observatory/. +- Jasmine Mithani and Alex Samuels: https://fivethirtyeight.com/features/who-the-census-misses/ +- Keyes, 2019, ‘Counting the Countless’, https://reallifemag.com/counting-the-countless/ +- NYTimes: https://www.nytimes.com/interactive/2021/us/new-york-covid-cases.html +- Rachael: https://github.com/rachaellam/Toronto-Crime-Rates/blob/main/outputs/paper/toronto_crime_analysis.pdf diff --git a/slides-resources/foundation_slides.md b/slides-resources/foundation_slides.md new file mode 100644 index 00000000..98a0ea14 --- /dev/null +++ b/slides-resources/foundation_slides.md @@ -0,0 +1,306 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Foundations Overview** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- +##### **Why take this course?** + +- Unix shells - more specifically bash - is a powerful tool for quickly and easily navigating and manipulating files, scaling automated tasks, accessing Git and processing data. + +- Git is extremely important for reproducibility of your personal work and collaborating with others on group projects. + +- Git is incredible at keeping a historical reference of the changes you make to your work and debugging your code. + +- Github has an amazing community with educational resources, open-sourced projects, and events. + +--- +##### **Learning Outcomes** +1. Become comfortable with Unix basics and more complicated functions + +2. Learn how to use Git and Github in solo and group projects + +3. Navigate how to solve problems that you encounter + +4. Understand why reproducibility is important and how to make your code reproducible + +--- +5. Grasp the ethical considerations of who is and isn’t in our datasets + +6. Recognize past abuses of power and their continued influence + +7. Learn professional skills and how to work within a team + +--- +##### **Prerequisites** +Please come prepared with a Github account. + +--- +##### **Assessments** +- A number of formative assessments that continuously put in practice what we learned in class + +- Attitudinal assessments to help understand how students feel about the material and any areas for additional review + +- One summative assessment that compiles everything we have learned + +- Written reflections + +--- +##### **Course Reading Material** +- Chacon and Straub, 2014, Pro Git, 2nd Edition. + +- Newham and Rosenblatt, 2005, Learning the bash shell: Unix shell programming, O'Reilly. + +- Timbers, Campbell, Lee, 2021, Data Science: A First Introduction, https://ubc-dsci.github.io/introduction-to-datascience/. + +- William E. Shotts, Jr., 2009, The Linux Command Line + +- Wilson, 2021, Building Software Together, https://buildtogether.tech/. + +--- + + +## `Unix` + +--- +##### **Intro to Unix and Linux** +**Readings:** Newham et al. chapter 1 & Scotts chapter 1 + +Unix encompasses many features. In this section, we will look at what Unix/Unix shells are and the differences between Unix and Linux, introduce Bash and understand why it’s important to learn. We will also get our environment set up so that we can try a few initial commands. + +--- +##### **Navigate Files and Directories** +**Readings:** Newham et al. chapter 1.6 & Scotts chapters 3-4 + +To begin, we’ll start with a bit of theory to understand directories and the differences between the types of paths and files. We’ll then look at some tools that will help us navigate our files and directories using different options and arguments. We’ll also learn a few commands that will help us quickly and easily navigate our system. + +--- +```bash +$ ls +``` +```bash +$ cd +``` +```bash +$ pwd +``` + +--- +##### **Working with Files and Directories** +**Readings:** Newham et al. chapter 1.7 & Scotts chapters 5, 7, 11 + +In this section we’ll start manipulating files and directories. This includes creating, copying, moving and more. We’ll then introduce inputs and outputs and how combine them into command pipes. + +--- +```bash +$ cp +``` +```bash +$ mv +``` +```bash +$ mkdir +``` +```bash +$ rm +``` +```bash +$ ln +``` + +--- +##### **Pipes and Filters** +**Readings:** Newham et al. chapters 1.9-2 & Scotts chapters 8-9 + 13 + +Continuing from last lesson, we’ll expand on pipes and introduce some filter commands that will help us gain more shell experience. We’ll also cover some important expansions and command line editing tips. + +--- +```bash +$ cat +``` +```bash +$ sort +``` +```bash +$ uniq +``` +```bash +$ grep +``` +```bash +$ find +``` + +--- +##### **Shell Scripts** +**Readings:** Scotts chapter 25 + +Now we’ll learn how to group together commands and compile them into shell scripts. This avoids writing commands one by one on the command line. We’ll build our first script and in the process discover how to write, run and store shell scripts. + +--- +##### **Shell Functions** +**Readings:** Newham et al. chapter 4 & Scotts chapters 26, 33, 35 + +A good practice in programming is to create functions which separate larger tasks into smaller tasks. We’ll learn the basic structure of functions, how to use self contained variables and parameters and further expand upon expansions. + +--- +```bash +function name { + commands + returns +} +``` +```bash +name () { + commands + returns +} +``` + +--- +##### **Flow Control** +**Readings:** Newham et al. chapter 5 & Scotts chapters 28, 30 + +In the final lesson of Unix Shell, we’ll introduce more advanced topics: if statements and loops. In doing so, we’ll be able to write scripts that make decisions based on true/false statements and allow portions of our program to repeat. + +--- +```bash +x=5 +if [ $x = 5 ]; then + echo "x equals 5." +else + echo "x does not equal 5." +fi +``` + +--- + + +## `Git and GitHub` + +--- +##### **Intro to Git and Github** +**Readings:** Wilson chapter 1 & Timbers chapter 12.3-12.4, 13.3.1 + +Git and Github are extremely important for code revisions, reproducibility and collaboration. This introduction will discuss local, centralized and distributed version control as well as how to get started with Git using some of our knowledge from the Unix lessons. + +--- +##### **Git Basics** +**Readings:** Wilson chapter 2 + +At this point we’ll create our first repository in an existing directory or by cloning a directory. We’ll also introduce how to ignore files and why we might want to do that. + +--- +##### **Git Commands** +**Readings:** Wilson chapter 2 & Timbers chapter 12.5-12.6 + +This lesson will contain the most important Git commands. We’ll learn how to pull any changes, check the status of our work, commit changes, push them to our repository and even more commands and options. We’ll also discuss why adding messages to our commits is important and should be added to our practice. + +--- +##### **Remote Repositories** +**Readings:** Chacon and Straub chapter 2 & Timbers chapter 12.5-12.6 + +Here we will be focusing on remote repositories and the workflow you should adopt to appropriately collaborate with teammates. We'll learn commands such as `git remote`, `git pull` and `git push` during this section. + +--- +##### **Branching and Pull Requests** +**Readings:** Wilson chapter 3 + +Git is extremely useful for separating work that you want to develop and test from the main line to avoid damage. It does this through a feature called branching. We’ll be learning how to create these branches and merge them when our work is sufficient. + +--- +##### **Collaborating** +**Readings:** Wilson chapter 3 & Timbers chapter 12.8 + +Another amazing feature of Git is the ability to collaborate with others. We’ll discuss how to grant access to our repositories and how branching can help us collaborate. We’ll also go through some of the best practices when collaborating with others. + +--- +##### **Dealing with Conflicts** +**Readings:** Wilson chapter 6 + +Collaborating with others can produce several conflicts. We’ll explore some practices to deal with merge conflicts when multiple individuals are working on a project, as well as Github Issues as another tool in collaboration. We’ll end with some debugging using annotations and binary searches. + +--- + + +## `Important Considerations` + +--- +##### **Problem Solving** +Problem solving is a necessary skill when writing code. In this section we’ll learn how to identify the problem and effectively search for our solution using Google and Stack Overflow. We’ll also begin a discussion how how reproducibility makes a difference when asking for help. + +--- +##### **Reproducibility** +Reproducibility is extremely important in reducing and solving errors and increasing trust and transparency. We’ll have thorough discussions on the significance of reproducibility and how to practice it through code commenting, documentation writing and proper folder structures. + +--- +##### **Ethics** +When looking at open-source projects on Github or other libraries, it’s important to not take the information and results we see at face-value. We’ll be discussing what to look for, and who might be inappropriately excluded from our data. We’ll be examining at several datasets to analyze the ethics of the project and what might be missing. + +--- +##### **Inequality** +When ethics are not taken into consideration, massive inequality can take place. We’ll further our understanding of what happens when ethics are dismissed and the past abuses of power that have occured under this massively harmful failure. + +--- +##### **Professional Skills** +We’ll end this module with a lesson on pertinent and tech based professional skills. This includes healthy work habits, time management and best practices in meetings. We’ll also discuss some team collaboration skills such as code reviews and sprint methodology. + + +--- + + +**Discussion/Questions** diff --git a/slides-resources/git_slides.md b/slides-resources/git_slides.md new file mode 100644 index 00000000..fde8e0f8 --- /dev/null +++ b/slides-resources/git_slides.md @@ -0,0 +1,2129 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Version Control and GitHub** +```console +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- +**Prerequisites:** +- GitHub account + +--- +**Key Texts:** +- Chacon and Straub, 2014, Pro Git, 2nd Edition. +- Timbers, Campbell, Lee, 2021, Data Science: A First Introduction, https://ubc-dsci.github.io/introduction-to-datascience/ + +--- +**References** +- Chacon and Straub: Chapter 1 +- Timbers: Chapter 12.3 - 12.4, 13.3.1 + +--- + + +## `Version Control` + +--- +##### **What is Version Control?** +Version control is a system that records changes to a file or a set of files over time so that we can recall a specific version later. We may already do this by copying files to another directory to save past versions.While it is simple, it lacks flexibility and complexity. + +--- +Version Control Systems (VCS) can do a number of things and can be applied on nearly any type of file on our computers: +- revert files to a previous state +- revert entire project to a previous state +- compare changes over time +- see who modified something last +- who introduced an issue and when +- recover lost files + +--- +##### **Local Version Control Systems** +Local VCSs were developed to keep track of changes to our files by putting them in a version database. + +![bg right contain](pics/LVC.png) + +--- +##### **Centralized Version Control Systems** +Centralized VCSs (CVCS) were developed to enable collaboration with developers on other systems. CVCSs have a single server that contains all the versioned files. + +![bg left contain](pics/CVCS.png) + +--- +CVCSs allow some level of transparency to others' work and give Administrators a level of control over what developers can and can't do. + +Unfortunately, a single server means that if it ever goes down, all collaboration halts for however long that lasts for. Additionally, if backups haven't been kept, work could easily be lost. + +--- +##### **Distributed Version Control Systems** +To handle the limitations of LVCSs and CVCSs, Distributed VCSs were created. This includes Git, Mercurial and Bazaar. + +Collaborators mirror the entire repsoitory, therefore if a server dies, any one of the collaborators' repositories can be copied back to the server to restore it. + +--- +![w:560 center](pics/DVCS.png) + +--- + + +Questions? + +--- + + +## `Git` + +--- +##### **Git Basics** +Git thinks of data in a very different way than other VCSs. Instead of storing a set of files and the changes over time, Git thinks of its data more like a set of snapshots of a mini file system. + +If files have not changed, Git does not store the file again, it links to the previous identical file already stored. + +--- +![w:1100 center](pics/git_data.png) + +--- +##### **Local Operations** +Most operations on Git only need local files and resources to operate. Git also keeps the entire history of our projects on our local disks meaning we can see changes made months ago without a remote server. + +We also don't need to be connected to the server to get work done, rather we only need to be connected when we want to upload our work. + +--- +##### **Benefits** +Git uses a check-summing mechanism called *SHA-1 hash* which is calculated based on the contents of a file or directory structure in Git. It looks somehting like this: +``` +24b9da6552252987aa493b52f8696cd6d3b00393 +``` +This checksum means it's impossible to change the contents of any file or directory without Git knowing about it. + +Git generally only adds data, making it fairly difficult to lose data once we've committed, which we'll learn about later. + +--- +##### **The Three States** +There are three main states that our files can reside in: +- Committed: + - data is safely stored on local database +- Modified: + - file has been changed but not yet committed +- Staged: + - modified file has been marked to go into the next commit + +--- +##### **The Three Main Sections** +There are three main sections to a Git project: +- The Git directory +- The working directory +- The staging area + +--- +##### **The Git Directory** +The Git directory is where Git stores the metadata and object database for our projects. It is what is copied when we clone a repository from another computer. + +--- +##### **The Working Directory** +The working directory is a single checkout of one version of our projects. These files are pulled out of the compressed database in the Git directory and placed on the disk for us to modify. + +--- +##### **The Staging Area** +The staging area is a simple file that stores information about what will go into our next commit. + +--- +##### **Workflow** +A basic workflow will look something like this: +1. Modify files in our working directory +2. Stage the files in the staging area +3. Commit the changes which takes the files from the staging area and stores them on the Git directory. + +--- + + +Questions? + +--- + + +## `Installing Git` + +--- +Typically, Git is already installed on our system but we can check for that using the `git` command: +```console +$ git --version +``` +**Does anyone not see a version?** + +--- +##### **Installing on Linux** +If you're on Ubantu: +```console +$ sudo apt install git +``` +
+ +If you're on Fedora, RHEL or CentOS: +```console +$ sudo dnf install git +``` +```console +$ sudo yum install git +``` + +--- +##### **Installing on Mac** + +You can install Git via Homebrew, if you have Homebrew installed (https://brew.sh/). +```console +$ brew install git +``` + +Finally, you can install Git from source at this link: https://sourceforge.net/projects/git-osx-installer/ + +--- +##### **Installing on Windows** +The download will start automatically through this link: https://git-scm.com/download/win + +--- + + +Questions? + +--- + + +## `Git Setup` + +--- +The first thing to do now that we have Git installed on our system is to customize it. These changes will remain despite any upgrades to Git that we install. + +Using the command `git config`, we can set configuration variables that control all aspects of how Git looks and operates. + +--- +##### **Checking Configurations** +Before we change any of our global configurations, we can check what they are: +```console +$ git config --list +``` +If we haven't configured Git, we can do that now! + +--- +##### **Identity** +First, we'll set our username and email address. Git uses this information everytime we commit. +```console +$ git config --global user.name "Rachael Lam" +$ git config --global user.email "rachael.a.lam@gmail.com" +``` +The option `--global` means that we only have to pass this through once. + +--- +##### **Editor** +Next, we'll configure our the default text editor for when Git needs to type in a message. Git uses our system's default editor (usually Vi or Vim) but we can change it if we prefer. If we want to change the editor to emacs, we would do so below: +```console +$ git config --global core.editor emacs +``` + +--- +##### **Diff Tool** +We can also set the default diff tool which is used to resolve merge conflicts: +```console +$ git config --global merge.tool vimdiff +``` + +--- +##### **Checking the Setting** +We can use the `git config --list` command to see all Git settings. See the values of a specific specific setting: +```console +$ git config user.name +``` + +--- +##### **Help** +If we ever need help, even offline, we can access the manual page three ways: +1. `$ git help ` +2. `$ git --help` +3. `$ man git-` + +For example, we can get help for the `config` command: +```console +$ git help config +``` + +--- + + +Questions? + +--- + + +# **Git Basics** + +--- +**References** +- Chacon and Straub: Chapter 2 + +--- + + +## `$ git init` / `$git clone` + +--- +##### **Respositories in an Exisiting Directory** +We're quickly getting into how to start our first Git repository, or commonly known as repo. First we'll learn how to import an existing repo into Git: +```console +$ git init +``` +```console +$ git init -b main +``` +Here we're creating a new subdirectory named `.git` that will contain all our necessary repo files. The option `-b` will create a new branch called main. + +--- +##### **Cloning an Existing Respository** +If we want to collaborate on an existing repo, we need to clone the repo from GitHub. If we don't have a project set up yet, we'll need to do that first. + +--- +1. Create a new project +
+ +![w:1100 center](pics/creatingrepo1.png) + +--- +2. Add name and optional description +
+ +![w:1100 center](pics/creatingrepo2.png) + +--- +3. Choose public or private and add initialize +
+ +![w:1000 center](pics/creatingrepo3.png) + +--- +There are a number of automatically generated files such as log files that we might not want Git to add or show as untracked. We can create a file called `.gitignore` to ignore the automatically generated files. + +The `.gitignore` is dependent on the type of coding language you are using but can also be modified to fit specific purposes. + +--- +If we created a repo on GitHub, we can choose a `.gitignore` template. We can select a template specific to the coding language we are using. + +![w:900 center](pics/gitignoresmall.png) + +--- +Once we have our repo, we can clone it: +```console +$ git clone https://github.com/rachaellam/git-module.git +``` +Using this code, we've created a repo called `git-module` (by taking the last part of the link) and initialized a `.git` directory and pulled all data for that repository while checking for the latest copy. + +--- +The url used in the previous code block is copied directly from GitHub by clicking code and copying the HTTPS: + +![w:1150 center](pics/github.png) + +--- +If we want to change the name of the repo, we can specify that as the next command line option: +```console +$ git clone https://github.com/rachaellam/git-module.git mymodule +``` + +--- + + +Questions? + +--- + +# **Git Commands** + +--- +**References** +- Chacon and Straub: Chapter 2 +- Timbers: Chapter 12.5 + +--- + + +## `$ git status` + +--- +##### **Tracked and Untracked Files** +Files in our working directory can either be tracked or untracked. Tracked files are files that that were in the last snapshot and can be unmodified, modified or staged. Untracked files are files that aren't in our last snapshot or staging area. + +When we modify a file, Git keeps track of the modifications even before we've decided to commit. We can then stage the modifications and then commit. + +--- +![w:1000 center](pics/workflow.png) + +--- +##### **File Status** +To better understand what state our files are in, we can check the status: +```console +$ git status +``` +If we've just created our repo, we should see (or something similar): +```console +# On branch main +# Your branch is up to date with 'origin/main'. + +# nothing to commit, working tree clean +``` + +--- +Let's now add a README.md file, because every good repo has a good README. + +```console +$ touch README.md +``` + +And see the status: + +```console +$ git status +``` + +--- +```console +On branch main + +No commits yet + +Untracked files: + (use "git add ..." to include in what will be committed) + README.md +``` +Here we can see that we still haven't committed anything and that we have an untracked README.md file. Git also gives us a bit of information including how to add a file to track. + +--- + + +## `$ git add` + +--- +##### **Tracking New Files** +To track new files, or stage new files, we can use `git add` along with the file that we want to track: +```console +$ git add README.md +``` +We can run `git status` again to see the results of `git add`. + +--- +```console +On branch main + +No commits yet + +Changes to be committed: + (use "git rm --cached ..." to unstage) + new file: README.md +``` +Now we can see that our README.md file is staged to be committed. + +--- +Let's say we add some more info to our README.md file, which has now been tracked. If we run `git status`, we can know: +```console +On branch main + +No commits yet + +Changes to be committed: + (use "git rm --cached ..." to unstage) + new file: README.md + +Changes not staged for commit: + (use "git add ..." to update what will be committed) + (use "git restore ..." to discard changes in working directory) + modified: README.md + +``` + +--- +We can stage our additional changes and check the status: +```console +$ git add README.md +$ git status +``` +```console +On branch main + +No commits yet + +Changes to be committed: + (use "git rm --cached ..." to unstage) + new file: README.md + +``` + +--- +Let's try adding another file into our directory. It can be something that you've been working on independently, or we can add our project from the previous Unix module. + +--- +If we modify many things at once, we can add the option `-A` to add all files, rather than adding one by one +```console +$ git add -A +``` +A little note about this: it's best to upload your work in small chunks for readability and for collaboration. So if you have a bunch of files, it's recommended to split them into smaller chunks. + +--- + + +Questions? + +--- + + +## `$ git diff` + +--- +If we want to see more details of the changes that we've made, we can use the command `git diff`. + +`git diff` compares what is in our working directory to what is in our staging area. If we've made changes to our files without running `git add`, we'll see the comparison. If there are no differences, nothing will be shown. + +--- +```console +diff --git a/README.md b/README.md +index e69de29..4711fce 100644 +--- a/README.md ++++ b/README.md +@@ -0,0 +1 @@ ++# git-r +\ No newline at end of file +``` + +--- +```console +diff --git a/README.md b/README.md +``` +This is telling us what we're comparing. In this case, it's the difference between a previous version of the README file and the current one + +--- +```console +index e69de29..4711fce 100644 +``` +Here is some meta data, or hash identifier that we likely won't need. + +--- +```console +--- a/README.md ++++ b/README.md +``` +This is acting as a legend. Changes from `a/README.md` are marked by `---` and changes from `b/README.md` are marked by `+++` + +--- +```console +@@ -0,0 +1 @@ ++# git-r +``` +Here we're being told the lines that have changed and what on those lines changed. Because there was nothing removed, this is a bit of a simplistic representation. + +--- +We might see something more like... +```console +@@ -21,5 +77, 12 +``` +This is telling us 5 lines were removed starting on line 21 and 12 lines were added starting on line 77. + +--- +##### **--staged** +If we want to see the details of what will go into the next commit, we can use `git diff` with the option `--staged` + +--- + + +## `$ git commit` + +--- +Once we've staged your selected files, it's time to commit the changes. Anything that wasn't staged (any modifications since `git add`) will not be included in the commit. + +`git commit` is most easily run with the option `-m`. This adds a message to your commit + +```console +$ git commit -m "adding a message here" +``` + +--- +##### **-m** +Messages should be clear. They can also be extremely detailed if needed. By not including the option `-m`, Git will provide the latest output of `git status`. If you want even more information, you can use the option `-v`. + +--- +Messages are extremely important for our own records and also when collaborating with others. They can act as a reminder for what our commit includes, and also tell our collaborators what we did last. + +It's important to commit often as well so that merges are easier to locate and fix. + +It's also helpful if you want to go back to an earlier version. You have more options to choose from. + +--- +Practices around messages can vary but if we want to add a longer message we can remove the `-m` option. +```console +$ git commit +``` +Then hit `i` to add a message. You'll see `-- INSERT --` at the bottom and you can begin typing your message. + +When finished, press `esc` then `:wq` or `:x`. + +`w` means write and `q` means quit. `x` is shorthand for `wq` + +--- +``` +Short (50 chars or less) summary of changes + +More detailed explanatory text, if necessary. Wrap it to about +72 characters or so. In some contexts, the first line is treated +as the subject of an email and the rest of the text as the body, +the blank line separating thesummary from the body is critical +(unless you omit the body entirely). + +Further paragraphs come after blank lines. + +- Bullet points are okay, too + +- Typically a hyphen or asterisk is used for the bullet, preceded + by a single space with blank lines in between, but conventions + vary here +``` + +--- +##### **-a** +If we want to commit all the files we've worked on without putting them in the staging area, we can use the option `-a`. This will avoid using `git add` and condense our workflow. +```console +$ git commit -a -m "skip staging add message" +``` +Here we've used two options, `-a` and `-m` to skip the staging and add a message. + +--- + + +Questions? + +--- + + +## `$ git rm` + +--- +If we delete a file from our working directory after staging it using `rm` without `git`, the file will show up in our untracked files. We can then use `git rm` to stage the file's removal. + +Let's follow the code below to understand this better: +```console +$ touch test.sh +$ git status +$ rm test.sh +$ git status +``` +Because we haven't tracked the `test.sh` file so we can remove it and we don't need to tell git to also remove it. + +--- +What happens if we add a file to our staging area but then want to delete it? Let's try the two codes below: +```console +$ touch test.sh +$ git add test.sh +$ git rm test.sh +``` + +```console +$ touch test.sh +$ git add test.sh +$ rm test.sh +$ git rm test.sh +``` + +--- +##### **-f** +If we've modified and staged a file, we have to force the removal with the option `-f`. This is a safety feature so that we don't accidentally delete something. + ```console + $ touch testfile + $ git add testfile + $ git rm -f testfile + ``` + + --- + ##### **--cached** + The option `--cashed` allows us to remove a file from our staging area without permanently deleting it from our local drive. +```console +$ git rm --cached testfile +``` +We can use wildcards to remove files from our staging area in bulk, although we have to add a backslash in front of `*` because Git does its own filename expansion. +```console +$ git rm -f \*.txt +``` + +--- +We can also delete files in a folder of our working directory: +```console +$ git rm -f dir1/\*.sh +``` + +--- + + +## `$ git mv` + +--- +Using `git mv`, we can rename files conveniently and succinctly: +```console +$ git mv test.txt test.sh +``` + +--- + + +Questions? + +--- + + +## `$ git log` + +--- +Sometimes we might want to see a history of our commits or we want to see previous commits after cloning an existing repository. We can do this using the `git log` command. + +```console +$ git log +``` +There are a number of options that help us see even more, or sometimes less, information about each commit. + +--- +If we attempt to run a log before any commits have been made, we will get an error: +```console +fatal: your current branch 'main' does not have any commits yet +``` + +--- +##### **-p** +Adding the option `-p` will show the `diff` introduced in each commit. We can also pass a number option that will limit the number of entries shown: + +```console +$ git log -p -2 +``` +Entries can be any number of entries (`-`)but is limited to one page of log out puts + +--- +##### **--stat** +The `--stat` option shows abbreviated stats for each commit: +```console +$ git log --stat +``` + +--- +```console +commit 6c91df668d1899317a643153bd169d37fe05d9f1 (HEAD -> main) +Author: Rachael Lam +Date: Fri Feb 18 14:56:27 2022 -0500 + + first commit + + .gitignore | 4 ++++ + README.md | 1 + + test.Rproj | 13 +++++++++++++ + testfile.r | 0 + 4 files changed, 18 insertions(+) +``` +`+` or `-`(if there were any) show the number of insertions or deletions. We can also see the date of the commit, who committed and the message. + +--- +##### **--pretty** +The `--pretty=` option is an interesting feature that enables us to specify the log output when we combine it with `format:`, creating an extremely useful data extraction feature: +```console +$ git log --pretty=format:"%h - %an, %ar : %s" +``` + +--- +##### **Formatting Options** +Option | Description +:-----|:------ +%H | Commit hash +%h | Abbreviated commit hash +%t | Abbreviated tree hash +%p | Abbreviated parent hashes + +--- +Option | Description +:-----|:------ +%an | Author name +%ae | Author email +%ad | Author date (ex. Thu Dec 2 14:14:55 2021 -0500) +%ar | Author date relative (ex. 26 hours ago) +%cn | Committer name +%s | Subject (-m) + +--- +##### **--since / --until** +The options `--since=` and `--until=` are more usually more useful than `-(n)`. They produce the logs of any time before (`--until`) or after (`--since`) a certain date. You can specify an exact date or relative date: +```console +$ git log --since=2.weeks +``` +```console +$ git log --since="2 days 3 minutes ago" +``` +```console +$ git log --until="2021-11-20" +``` + +--- +We can also combine log options to generate specific outputs: +```console +$ git log --pretty=format:"%h: %s" --author=Rachael +``` +```console +$ git log --after="2020-11-01" --since="2020-11-30" +``` + +--- +Finally, and a favourite for quick glances: + +```console +$ git log --oneline +``` + +--- + + +Questions? + +--- + + +## `undo undo undo` + +--- +##### **Changing Commit** +If we already committed a few files but forgot to add one or made modifications since our commit that we want to add, we can use the option `--amend` +```console +$ git commit -m "initial commit" +$ git add missed_file +$ git commit --amend -m "initial commit with missed_file" +``` +We can still add the `-m` option to add a new comment. + +--- +##### **Unstaging** +When we want to remove a file from our staging area because we accidentally added one too many files, we can use the code below: +```console +$ git reset HEAD README.md +``` +If we ever forget how to do this, running `git status` will remind us. + +--- +##### **Unmodify** +We can also revert our files back to the version from our previous commit using `git checkout --`. It's important to realize that this command essentially rewrites the file so any changes that were made will not be able to be recovered. + +As well, any commit can usually be recovered but anything that was never committed will most likely be lost forever. +```console +$ git checkout -- README.md +``` + +--- +##### **Select Previous Commit** +To select a previous commit to revert to, we need the hash of the commit: +```console +$ git log +$ git checkout file1 +``` +This can be used forwards or backwards, ie. you can also "revert" to a commit that later than your current version. + +You can also revert several files at the same time +```console +$ git checkout file1 file2 +``` + +--- + + +Questions? + +--- + +# **Remote Repositories** + +--- +**References** +- Chacon and Straub: Chapter 2 +- Timbers: Chapter 12.5-12.6 + +--- + + +## `$ git remote` + +--- +Remote repos are versions of our projects that are hosted on the internet or some network. This allows us to collaborate with others outside of our local repo. + +We can see the remote servers we've configured using `git remote`. If we add the option `-v`, we can see the URL: +```console +$ git remote -v +``` +Cloned repos will be displayed as origin by default. + +--- +##### **Remote Setup** +Before we connect our local repo to a remote repo, we need to setup our permissions. This is so we can send and retrieve work to and from our remote repositories. There are two ways to do this: + +1. Access Tokens + +2. SSH + +--- +##### **Access Tokens** + +               ![w:350 left](pics/settings.png)           ![w:340 right](pics/developer.png) + +--- +![w:1150 center](pics/personalauth.png) + +--- +##### **SSH** +```console +$ ls -al ~/.ssh +``` +If SSH has not been set up on your computer, you should see something like: + +```console +ls: cannot access '/c/Users/rachaellam/.ssh': No such file +or directory +``` + +Otherwise you'll see filenames `id_ed25519` and `id_ed25519.pub` OR `id_rsa` and `id_rsa.pub` which represent your public and private keys. + +--- +```console +$ ssh-keygen -t ed25519 -C "rachael.lam@mail.utoronto.ca" +``` +Use the code above but with your email. This will output: + +```console +Generating public/private ed25519 key pair. +Enter file in which to save the key (/c/Users/rachaellam/.ssh/ +id_ed25519): +``` +Press `enter` to use the default file. + +--- +You will then be prompted to add a passphrase. You cannot reset this passphrase, so be sure to remember it or write it down somewhere safe: + +```console +Created directory '/c/Users/Vlad Dracula/.ssh'. +Enter passphrase (empty for no passphrase): +``` + +It will then ask you to reenter the passphrase: + +```console +Enter same passphrase again: +``` + +--- +You will then get a confirmation with a random piece of art at the end. It will show the private key (*identification*) which you should never share, the *public key* and the *key fingerprint* which is a shorter version of the public key. +```console +Your identification has been saved in /c/Users/rachaellam/.ssh/ +id_ed25519 +Your public key has been saved in /c/Users/rachaellam/.ssh/ +id_ed25519.pub +The key fingerprint is: +SHA256:SMSPIStNyA00KPxuYu94KpZgRAYjgt9g4BA4kFy3g1o +rachael.lam@mail.utoronto.ca +``` + +--- +Now we can check that we have the public and private key files: +```console +$ ls -al ~/.ssh +``` + +--- +It's time to give GitHub our public key so let's read the public key file and copy it: +```console +$ cat ~/.ssh/id_ed25519.pub +``` +Output: +```console +ssh-ed25519 AAAAC3NzaC1lZDI1NPN7AAAAIDmRA3d51X0uu9wXek559gfn6UFNF +69yZjChyBIU2qKI rachael.lam@mail.utoronto.ca +``` +Copy the long public key to add to GitHub. + +--- +##### Settings --> SSH and GPG keys --> New SSH key +Add a title like `rachael's key` and paste the public key then click *Add SSH key*. + +Finally, we can check that it's been authenticated: +```console +$ ssh -T git@github.com +``` + +--- +##### **remote add** +To add a remote repo, we can use `git remote add` followed by the name and URL. Now we can connect our local repo to a remote repo: +```console +$ git remote add origin https://github.com/rachaellam/git-r.git +$ git remote -v +``` +After checking we'll see: +```console +origin https://github.com/rachaellam/git-r.git (fetch) +origin https://github.com/rachaellam/git-r.git (push) +``` + +--- +If we want to see more information about a remote repo, we can use the command: +```console +$ git remote show origin +``` +Here we can see the URL that we're fetching and pulling from, our remote branches, and configurations for git push (to the main branch or another). + +--- +To send and retrieve work between our local and remote repositories, we have to authenticate a personal access token: + +               ![w:350 left](pics/settings.png)           ![w:340 right](pics/developer.png) + +--- +![w:1150 center](pics/personalauth.png) + +--- + + +Questions? + +--- + + +## `$ git fetch` / `$ git push` + +--- +When collaborating with others, changes might be made that are important to copy to your local directory. `git fetch` will get any new changes but it won't merge it to our work or modify our work. +```console +$ git fetch origin +``` + +--- +`git pull` will automatically fetch and merge a remote branch to our current branch (more on branching later). It's a good practice to pull before every work session, especially when working with others. Otherwise, a collaborator might have made changes, and you won't be able to push your changes to GitHub. +```console +$ git pull +``` + +--- +If we've create our remote repository using `init` and `remote add`, we need to specify the remote that we want to pull to and the branch we want to pull from. +```console +$ git pull origin main +``` +`origin` being the name of the remote repo we created earlier and `main` being the main branch on our GitHub repo. + +--- + + +Questions? + +--- + + +## `$ git push` + +--- +When we're ready to share our modifications, we have to push our project and files upstream using `git push` +```console +$ git push origin main +``` +Here we're pushing to our origin server on your main branch. The main branch is sometimes called the master branch. + +This command only works if we have write access and if no collaborator is pushing upstream at the same time as we are. We'd have to instead pull and merge their work before pushing our own. + +--- + + +Questions? + +--- + +# **Git Branching** + +--- +**References** +- Chacon and Straub: Chapter 3 +- Timbers: Chapter 12.8 + +--- +Branching allows us to diverge from the main line to do work without accidentally messing with the main line. This helps with testing without making any accidental changes to the working branch. + +To understand how branching works, let's go back and understand how Git saves files. +- blob +- tree +- pointer + +--- +![w:1000 center](pics/blobs.png) + +--- +A branch is a way to move different pointers to a specific commit. In Git, the default branch is named *master* or *main*. When we first start making commits, we start at the master branch that automatically points to the last commit made. + +![w:700 center](pics/master.png) + +--- + + +## `$ git branch` + +--- +We can make a new branch which creates a new pointer for us to move around. We can do this by using the command `git branch`: +```console +$ git branch testing +``` + +--- +Here, we've created a branch called testing, which means we've created a new pointer that could point to our current commit. + +![w:800 center](pics/testing.png) + +--- + + +## `$ git checkout` + +--- +Git tracks what branch we're on using a pointer called `HEAD`. If we move the `HEAD` to the branch *main*, we'll see: +```console +Already on 'main' +``` +To move `HEAD` to point to the testing branch that we just created, we use `git checkout`: +```console +$ git checkout testing +``` +and we should see.. +```console +Switched to branch 'testing' +``` + +--- +![w:800 center](pics/testing-head.png) + +--- +If we make some changes to our testing branch and commit, our head will move with the new commit. + +![w:800 center](pics/testing-commit.png) + +--- +If we want to go back to an older version of our project and make changes, we can use `git checkout` again to redirect the head back to our master branch: +```console +$ git checkout main +``` +Using this command will move the `HEAD` pointer back to our master branch and revert our files in our working directory back to the snapshot that the master branch points to. + +--- + + +Questions? + +--- + + +## `Branching and Merging` + +--- +Let's take a look at a workflow that you might encounter: +```console +$ git commit -m "commits to master branch" +``` +![w:800 center](pics/workflow1.png) + +--- +```console +$ git checkout -b iss53 +``` +![w:600 center](pics/workflow2.png) + +--- +```console +$ git commit -a -m "commits to iss53" +``` +![w:700 center](pics/workflow3.png) + +--- +```console +$ git checkout master +$ git checkout -b 'hotfix' +$ git commit -m "commits to hotfix" +``` +![w:600 center](pics/workflow4.png) + +--- +```console +$ git checkout master +$ git merge hotfix +``` +![w:500 center](pics/workflow5.png) + +--- + + +## `$ git merge` + +--- +In the last step we saw a command called `git merge`. Once we've committed changes and are ready to deploy, we can use `git merge` to merge our working branch back into our master branch. + +```console +$ git merge testing +``` + +--- +![w:600 center](pics/delete-branch.png) + +--- +We can then delete the branch that we've created, as the master branch points to the same place. + +Adding the option `-d` will delete the branch that had been merged with the main, as we no longer need it. +```console +$ git branch -d testing +``` + +--- +Remember that changes to our master branch have not been added to our *iss53* branch. We either need to `pull` them in or wait to integrate them when we `pull` *iss53* into the master branch + +![w:600 center](pics/difference.png) + +--- +If we're merging a branch with the main that has been changed since we diverged, merging isn't as simple for Git. + +Git will create a new snapshot of the merge and automatically create a new commit that points to it, called a `merge commit`. + +![w:600 center](pics/merge.png) + +--- +We saw `git branch` earlier with the option `-d` to delete a branch, but to get a list of our current branches, we can run `git branch` without any arguments. +```console +$ git branch +``` +The `*` indicates the branch we are currently on or have checked out (`git checkout`) + +--- +If we run `git branch` with the option `-v`, we can see the last commit on each branch. This is another reason why comments are so important to add to our commits: they can be extremely useful when looking back at our work and seeing what we've done. + +--- +We can also add the options `--merged` or `--no-merged` to `git branch`. `--merged` allows us to see what branches been merged to the branch we're currently on. Branches without the `*` are generally safe to delete because we've already merged our work with our main branch. +```console +$ git branch --merged +``` + +--- +On the other hand, `--no-merged` allows us to see all the branches that haven't been merged. +```console +$ git branch --no-merged +``` +If we try to delete one of these branches, we will receive an error. We can force delete using the option `-D`. + +--- + + +## `Merge Conflicts` + +--- +Often times, merging our work with other topic branches or the main branch creates errors. + +For example, if we've changed the same part of the same file differently in the two branches we're merging, we will encounter a conflict. + +Luckily, Git helps us see where the error is to correct it. + +--- +![bg contain](pics/mergeconflicts.png) + +--- +Git shows us the beginning of the merge conflict with +`<<<<<<< HEAD` and the end with `>>>>>>>`. +
+ +`=======` separates the differences. +
+ +To fix the merge, you can choose one set of changes, the difference you prefer or re-write it entirely. You have to remove all identifiers of the merge conflict as well. + +--- + + +Questions? + +--- + + +## `Branching Workflow` + +--- +##### **Long-Running Branches** +Multiple long running branches are helpful when tackling large and complex projects. + +Typically, developers will keep the master branch as the stable branch or code that has been or will be released. They will then have parallel branches that are used for development and testing. + +Braches can also have various levels of stability, and will graduate/merge branches once they're fully tested. + +--- +![w:1000 center](pics/topics.png) + +--- +##### **Topic Branches** +Topic branches are short-lived branches that are created for a particular feature or related work. They allow us to quickly switch between topics and keep changes there for as long or as little as needed, regardless of the created or modified order, before merging. + +--- + +![w:600 left](pics/topics2.png) ![w:435 right](pics/topics3.png) + +--- + + +Questions? + +--- + + +## `Remote Branches` + +--- +Remote branches are pointers to the state of branches on our remote repositories. Our remote repositories can have multiple remote branches, just as we can have multiple braches on our local repositories. + +The format is `(remote)/(branch)` or `(remote) (branch)` + +If branches already exist on your GitHub repo, you will have access to these branches. If we're working with a branch that does not exist yet, we can push it to our remote repo. + +--- +##### **Pushing** +When we're ready to share our work, we'll use `git push`. If the remote branch already exists, we can push directly to that branch: +```console +$ git checkout testing +$ git add -A +$ git commit -m "testing branch commit" +$ git push origin testing +``` +This will push our changes to the existing testing branch on GitHub. + +--- +If we were working with a branch that only exists locally, we can push it to GitHub with a slight tweak: +```console +$ git checkout new-branch +$ git add -A +$ git commit -m "new branch commit" +$ git push origin main:new-branch +``` +This will create a new branch on GitHub called `new-branch`. From here, if we want to continue updating this branch, we can just run `git push origin new-branch`. + +--- +##### **Fetching** +When we `fetch` or `pull` files from our remote repos, we don't automatically have access to local, editable copies of files of the remote branches. + +We can do this in several steps. First we're going go fetch the remote branches: +```console +$ git fetch +``` + +--- +We can then see what branches exist remotely: +```console +$ git branch -v -a +``` +And we'll see something like this: +```console +* main 3d850f2 a commit + remotes/origin/HEAD -> origin/main + remotes/origin/main 3d850f2 another commit + remotes/origin/testing 3d850f2 another committ +``` + +--- +Then we'll create a branch that exists on our local drive: +```console +git checkout -b testing origin/testing +``` +Here we're pointing the `HEAD` to the new branch (`-b`) called `testing` from `origin/testing` + +--- +##### **Tracking Branches** +Tracking branches are branches that have a direct relationship with a remote branch. We can `push` and `pull` to and from these branches, as Git automatically knows which server and branch we're working with. + +For this to work, the name of your local branch must be the same as the remote branch + +--- +If the branches are named differently, we must run a different command for the push to be successful: +```console +$ git push origin HEAD:remote-branch +``` + +--- +##### **Deleting Branches** +If we've merged all our changes into our main branch, we can delete the remote branch with the following code: +```console +$ git push origin :testing +``` + +--- + + +Questions? + +--- + +# **Collaborating** + +--- +**References** +- Chacon and Straub: Chapter 3 + 5 +- Timbers: Chapter 12.8 + +--- +Much of the work that we do will involve working with others. It's important that we learn how best do this so we can successfully collaborate and avoid conflicts where possible. If conflicts arise, good collaboration practices help us resolve them with ease. + +So far we've learned several practices and commands that help us collaborate with others, including remote repositories and branches, `git pull` `git push` and `git merge` but we'll learn more practices that make collaboration straightforward. + +--- +There are many different factors that influence what workflow you might follow and how you might contribute to a project including: + +**1. Active contributor size** +Teams can vary from a few collaborators to thousands, varying the number of commits per day. + +**2. Chosen workflow** +Each project could have a different process to check patches including an integration manager or peer reviews. + +**3. Commit access** +Policies regarding how to contribute work can differ between projects, even by how much work or how often. + +--- +Let's take a look at a couple possible workflows: + +![w:500 center](pics/colabworkflow.png) + +--- +![w:600 center](pics/colabworkflow2.png) + +--- + + +## `GitHub` + +--- +##### **Adding Collaborators** +To collaborate with others on our GitHub repo, we can add collaborators so they have direct access to the repo: + +![w:1100 center](pics/gitcollabs1.png) + +--- +![w:1100 center](pics/gitcollabs2.png) + +--- +![w:800 center](pics/gitcollabs3.png) + +--- +Access does not have to be permanent. We can remove collaborators at any time and add additional ones when needed. + +Granting access to your repo this way, enables collaborators to make changes and push them to the repo without our constant permission. If we do not add push access, collaborators have to fork the repo and create pull request. + +--- +##### **Forking Projects** +Forking allows us to collaborate on projects without push access. We can fork a public project on GitHub and then clone it into our local server to begin making changes. + +![w:1150 center](pics/fork1.png) + +--- +Once a project has been forked, we can find the repo in our GitHub repositories. We can then clone the repo (`git clone`), make changes and push our changes without altering the original repo. + +Alternatively, we can clone the original repo, make our changes, fork the original repo and then merge our branch to the master branch of the forked repo. + +If we're collaborating with someone and we want our changes to be merged to the original repo, we can create a pull request. + +--- +##### **Pull Request** +After making a few changes, we now want to create a pull request to merge our changes with the original repo. We can do this directly in GitHub: + +![w:1100 center](pics/pullrequest2.png) + +--- +To the pull request, we can see what branches and repos we're attempting to merge: + +![w:800 center](pics/pullrequest3.png) + +--- +We can also see the changes that were made: + +![w:1100 center](pics/pullrequest4.png) + +--- +GitHub will also check to make sure that there are no conflicts with the base branch: + +![w:900 center](pics/pullrequest5.png) + +--- +Pull requests with no merge conflicts are easy to merge into the branches but it gets more complicated if there are merge conflicts: + +![w:1100 center](pics/pullrequestmergeconflict.png) + +--- +You can still create a pull request with merge conflicts: + +![w:1100 center](pics/pullrequestmergeconflict2.png) + +--- +![w:1000 center](pics/pullrequestmergeconflict3.png) + +--- +To resolve conflicts, it's very similar to merging conflicts through terminal: + +![w:1100 center](pics/pullrequestmergeconflict4.png) + +Because resolving conflicts is done on GitHub, it's a good practice to resove conflicts before creating a pull request. + +--- + + +Questions? + +--- + +# **Conflicts** + +--- +**References** +- Chacon and Straub: Chapter 3 + 6 +- Timbers: Chapter 12.5 + +--- +Conflicts are going to arise at some point, especially when working with others. It's important that we learn how to handle these conflicts for easier and more successful collaboration. + +--- + + +## `GitHub Issues` + +--- +GitHub issues are an extremely useful tool for communicating decisions, ideas and problems that are project specific. + +They are an alternative to email or Slack that keep communication isolated to a particular project. + +Issues can be *opened* on GitHub and even when they're *closed*, they remain available. They're also accessible to all collaborators for transperancy. + +--- +To open an issue, navigate to the project page and click *Issues*: + +![w:1100 center](pics/issues.png) + +--- +Then open a new issue: + +![w:1100 center](pics/issues2.png) + +--- +From here, we can add a title and description of the issue, and add any specific collaborators, labels, etc. + +![w:1100 center](pics/issues3.png) + +--- +##### **Information** +**Title:** should be descriptive and quickly convey what the issue is about + +**Description:** explain the purpose of the issue and how to potentially resolve it. If it's a bug fix, include a reprex, what you wanted to happen and what actually happen. You can also include steps already taken to solve the issue. + +--- +##### **Reprex** +- A reprex is a **REPR**oducible **EX**ample. + +- It contains just enough of the code to reproduce the error, ie. it is **self-contained** + +- We might have to create a smaller version of the code in order to create the reprex. Don't include anything that isn't related to the problem. + +- Sometimes, this process will help us solve our issue. + +--- +##### **Inclusions** +A minimal dataset to demonstrate the problem. This could be a regularly used one such as *iris* +```python +install.packages("dyplr") +library(dplyr) +head(mtcars) +``` + +or one easily built yourself. +```python +df <- data.frame (col1 = c(1, 2), + col2 = c(3, 4)) +df +``` + +--- +- Make sure to include classes that are necessary to your reprex (ex. dates, factors, etc.) + +- If you're using randomly sampled data, set the seed to so the same data is produced each time. +```rstudio +set.seed(853) +``` + +--- +Include all packages that you need. +
+ +- Make sure they are placed at the top of the script so it's quick and easy to see what is necessary for the reprex. + +--- +##### **Other Inclusions** +- Details about the issues you are facing. + +- Comments that will add clarification to your error. + +- Add what fixes have been attempted. This could include pages to StackOverflow articles that you've viewed. + +- Communicate cleary what you're desired outcome is. + +--- +##### **Task Lists** +If an issue is quite large, it's possible to add tasks lists to break the issue into smaller pieces. +- Use square brackets `- [ ]` + +- To mark it complete, use `- [x]` + +- Issues can be linked to previous issues using + - the number `- [x] #11` + - a URL `- [x] https://github.com/rachaellam/git-r/issues/11` + +--- +Once an issue has been opened, we can respond and comment. + +When we decide it has been resolved, we can close the issue. The history of the issues can still be seen, even if it has been closed. + +--- + + +Questions? + +--- + + +## `Debugging` + +--- +##### **File Annotation** +File annotation can help us resolve issues in our code if we know where thie issue is. We can see when the code was introduced and by whom, line by line, using the aptly named `git blame`. + +```console +$ git blame -L 1,3 script.sh +^8e9b89da (Rachael Lam 2021-12-02 15:01:02 -0500 1) #line 1 +8e9b89da (Rachael Lam 2021-12-02 15:01:02 -0500 2) #line 2 +8e9b89da (Rachael Lam 2021-12-02 15:01:02 -0500 3) #line 3 +``` + +--- +`git blame` is combined with the filename we want to inspect. We can also use the option `-L` followed by two numbers to limit the number of lines shown. + +We can then see the partial SHA-1 of the commit that last modified the line, the author name and date of the commit, and the content of the file by line. + +When the SHA-1 is preceeded by a `^`, it indicates that those commits were when the file was first added to the project and have not changed since. + +--- +##### **Binary Search** +If we don't know where the issue is, we can use `git bisect` to get identify the commit that introduced an issue. +```console +$ git bisect start +$ git bisect bad +$ git bisect good [good_commit] +``` +First, we've started the bisect program. We then told the system that the current commit is broken using `bisect bad` followed by the last good commit using `bisect good [good_commit]`. We can see the different commit if we run `git log` that we learned earlier. + +--- +Git produced the number of commits that were between the good and the bad commit and then checked out the middle one. + +From here, we can run our test to see if the issue still exists. If it does, it means the issue was introduced in a commit before this middle commit and we can run `git bisect bad` to tell the system that there is still an issue. + +If it does not, then the issue was introduced after and we can run `git bisect good`. + +--- +We can keep running this loop until we find the commit that introduced an issue and make our corrections. + +When we're finished, we can run `git bisect reset` to reset our `HEAD` to where we were before we started. + +--- + + +## `Best Practices` + +--- +- Topic branches should be used to try out new code before integrating. They enable us to play around or leave for the time being it if it's not working. +- Commit often rather than submitting a massive commit. This makes it easier to review and merge changes, or revert if necessary. + +--- +- Create quality commit messages so that your collaborators can easily understand what has been done. For example: +``` +Short (50 chars or less) summary of changes + +More detailed explanatory text, if necessary. Wrap it to about +72 characters or so. In some contexts, the first line is treated +as the subject of an email and the rest of the text as the body, +the blank line separating thesummary from the body is critical +(unless you omit the body entirely). + +Further paragraphs come after blank lines. + +- Bullet points are okay, too + +- Typically a hyphen or asterisk is used for the bullet, preceded + by a single space with blank lines in between, but conventions + vary here +``` + +--- + + +Questions? + +--- + + +What is reproducibility? + +--- +- Reproducibility is the ability for for independent researches to obtain the same or similar results when repeating an experiment or test. + +- This concept has been widely used in natural sciences, but is not yet as popular in data science. + +- Remember, data science is a science. We question, hypothesize, test, and therefore, we should also have the same rigour of confirmation. + +--- +- Skepticism should always be able to be independently verified. We should be able to defend our results and decisions. + +- Who would believe your results otherwise? More importantly, you should not believe results if they cannot be verified. + +--- + + +Why is reproducibility important? + +--- +1. New Insights + +2. Reduce Error Risks + +3. Validate Results + +4. Transparency + +--- + + +How can we make our work reproducible? + +--- +There are a number of practices that can help make our work reproducible including: +- Commenting Code +- Technical Documentation +- Folder Structure + +--- + + +## `Commenting Code` + +--- + + +How does commenting code help in reproducibility? + +--- +Commenting code is an important practice that benefits both ourselves and collaborators. + +Not only can we understand what we did to fix our own errors or improve our work, but others can better understand our code to reproduce it. + +--- +[Ellen Spertus](https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/) outlines 9 rules to follow: +
+ +1. Comments should not duplicate the code + +2. Good comments do not excuse unclear code +3. If you can’t write a clear comment, there may be a problem with the code +4. Comments should dispel confusion, not cause it + +--- +5. Explain unidiomatic code in comments + +6. Provide links to the original source of copied code +7. Include links to external references where they will be most helpful +8. Add comments when fixing bugs +9. Use comments to mark incomplete implementations + +--- +##### **1. Comments should not duplicate the code** +- Comments should add value to whoever is reading your code. +- Duplicating code adds unneccesary bulk and can actually make it more difficult to understand the code. +
+ +**Can you think of a bad example?** + +--- +Here is an example of what you should **not** do: +```bash +x=5 + +if [ $x = 5 ]; then + echo "x equals 5." # if x = 5 then ouput x equals 5 + +else + echo "x does not equal 5." # otherwise output x does not equal 5 + +fi +``` + +--- +##### **2. Good comments do not excuse unclear code** +- Our aim should always be having clear code, rather than relying on our comments to add clarity. +- Remember, we should not be adding more bulk to the code that makes it more difficult to understand. + +--- +##### **3. If you can’t write a clear comment, there may be a problem with the code** +>Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to +debug it. + +\- Kernighan's Law + +--- +##### **4. Comments should dispel confusion, not cause it** +- If our comments are adding further confusion, we should either rewrite the comment or remove it entirely. +- A could comment should always be written with the intent to help better understand what is being done. + +--- +##### **5. Explain unidiomatic code in comments** +- If we've purposefully written code that others may find unecessary, we need to comment our reasoning. +- Others may try to simplify our code if we don't explain our reasoning. +
+ +**Can you think of an example?** + +--- +##### **6. Provide links to the original source of copied code** +- Often times, we'll use code that others have written. It's important to give credit to the original source, but as well as give us a reminder as to where we got the code to reference it later if we need. +- Referencing the source can also provide other information such as what the problem was, why the solution was recommended and how it can be improved. It also means, we don't have to comment all of these details again in our own code. + +--- +An example: +```bash +# I got these 9 rules from Ellen Spertus' blog post on +# StackOverflow: https://stackoverflow.blog/2021/12/23/ +# best-practices-for-writing-code-comments/ +``` +- It's best to include the URL so other's don't have to search for the exact location. +- Remember: **never** copy code that you don't personally understand. +- Code from StackOverflow falls under Create Commons licenses so a reference comment is needed. + +--- +##### **7. Include links to external references where they will be most helpful** +- References don't just have to be used for copied code. They can also provide information on decisions made or changes in practices + +--- +##### **8. Add comments when fixing bugs** +- Comments can help others understand what we modified, if the modification is still needed, and how to test our modifications +- Although `git blame` can be used to find the commit that modified the code, a good comment can help locate the change and are quite brief. + +--- +##### **9. Use comments to mark incomplete implementations** +- Sometimes we have limitations in our knowledge or time. Adding code documenting these limitations can allow us to better address and fix the issues. + +--- +##### **Some other good practices:** +- Comments should be clear and efficient. Don't add more information than necessary, but don't be too vague +- Remember to update your comments if you update your code. Old comments can add more confusion. +- Inline comments can add noise as they're mixed with our code. Spacing can be helpful here: + +```python +colors = [[213/255,94/255,0], # vermillion + [86/255,180/255,233/255], # sky blue + [230/255,159/255,0], # orange + [204/255,121/255,167/255]] # reddish purple +``` + +--- +>Code tells you how, comments tell you why. + +\- Jeff Atwood, Co-founder of StackOverflow + +--- + + +## `Technical Documentation` +## `Writing` + +--- + + +What is technical documentation writing? + +--- + + +Why is it important to write a good technical documentation? + +--- +Technical documents are necessary for reproducibility as they relay important information about your project to others. Writing technical documents is not easy but should not be overlooked. + +A well done technical document will communicate the goals of a project and in doing so, can generate interest in the project. + +--- +GitHub outlines several pieces of information to include: +1. What the project does +2. Why the project is useful +3. How users can get started with the project +4. Where users can get help with the project +5. Who maintains and contributes to the project +
+ +This is just part of the story and we'll add more to this in the coming slides. + +--- +##### **README** +- Technical documentation writing is typically found in a `README.md` file. +- If the `README.md` file is placed in our repo's root, `doc` folder, or hidden in the `.github` directory, GitHub will place the contents of the `README.md` on the main repo page. +- The `README.md` file will be the first thing visitors see when they come to the project page so it's important to make it as appealing as possible. + +--- +##### **Examples** +Let's walk through some good examples of `README.md` files: +- [Create Go App CLI](https://github.com/create-go-app/cli#readme) +- [Human Activity Recognition](https://github.com/ma-shamshiri/Human-Activity-Recognition#readme) +- [Markdownify](https://github.com/amitmerchant1990/electron-markdownify#readme) +- [More!](https://github.com/matiassingers/awesome-readme) + +--- + + +What did you like about these README files? + +What similarities can you see? + +--- +##### **What should be included?** +1. Name of the project +2. What the project does +3. The project's usages +4. How to get started +5. Where to find help +6. Who contributes + +--- +##### **1. Name of the Project** +- The name of your project should be unambiguous. + +--- +##### **2. What the project does** +- This should be a description of the project. +- Provide context to the project and any reference links. +- Include features or background information +- *Can be titled "Description"* + +--- +##### **3. The project's usages** +- This should include how the project can be used. +- Provide examples using the code along with the expected output of said code. +- It should be a smaller example. Longer examples can be linked to. +- *Can be titled "Usages"* + +--- +##### **4. How to get started** +- This is the installation guide. +- Think of your particular audience and how much detail you might need to include. +- Add a requirements section if there are specific dependencies or needs to run in a particular programming language. +- *Can be titled "Installation"* + +--- +##### **5. Where to find help** +- Direct people on where to find help if they need. +- This could be the issues page on GitHub, a forum, or an email address. +- *Can be titled "Support"* + +--- +##### **6. Who contributes** +- This should outline how others can contribute to your project and what your requirements are for accepting contributions. +- *Can be titled "Contributing"* + +--- +##### **Additional Additions** +- **Visuals:** Visuals can grab people's attention, but they can also be helpful for showcasing what the code does. Include screenshots or GIFs that demonstrate your project. +- **Badges:** Badges provide metadata such as issue tracking, test results and downloads. [Shields.io](https://shields.io/) provides this service and you can also look at their [GitHub](https://github.com/badges/shields) for more information. +- **Acknowledgements:** Include the authors or anyone that helped with the project. + +--- +##### **Markdown** +- As noted by the extension, `README.md` files are usually written in markdown, thus using markdown syntax for styling. +- [GitHub](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) provides a good reference on how to write your README in markdown. + +--- +##### **Headings** +```markdown +# Largest Heading +## Second Largest Heading +### Third Largest Heading +``` +![w:1000 center](pics/headings.png) + +--- +##### **Text Styling** +```markdown +**bold** +*italic* +~~strikethrough~~ +**this is a *nested* example** +***bold and italic*** +``` +![w:1000 center](pics/text-styling.png) + +--- +##### **Quoting** +```markdown +> Block quote some text +``` +![w:1000 center](pics/blockquote.png) + +--- +##### **Unordered Lists** +```markdown +- this is an unordered list +- second item + - nested + - second nest +``` +![w:1000 center](pics/unordered.png) + +--- +##### **Ordered Lists** +```markdown +1. This is an ordered list +2. This is the second item + - with some additional information +3. This is the third +``` +![W:1000 center](pics/ordered.png) + +--- +##### **Codeblock** +Wrap your code in ``` to create a codeblock. + +![W:1000 center](pics/codeblock.png) + +--- +##### **Links** +```markdown +[Rachael's GitHub](https://github.com/rachaellam) +``` +![W:1000 center](pics/link.png) + +--- +##### **Images** +```markdown +![w:1000 center](pics/picture.png) +``` +![w:500 center](pics/bobs-burgers-louise.gif) +As we see, images can also be GIFs. We can also play around with the size and alignment. + +--- + + +## `Folder Structure` + +--- + + +What is folder structure and why is important? + +--- +A good folder structure is important for reproducibility because it easily allows for others to navigate and implement our projects. If someone references a file that is self contained, they know they won't have to change the file path to gain access. + +For example, what is the difference between these two paths: + +1. `"/Users/rachaellam/Documents/all-projects/this-project/data/"` + +2. `"this-project/data/"` + +--- +Folder structure can vary based on the project but a basic one to follow is... +- **/inputs** + - Everything that will not be edited including raw data and references +- **/outputs** + - Everything that was created during the project and your results +- **/scripts** + - All code that was written for the project + +--- +[Wilson et. al](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec009) also outline a file structure that is similar... +- **/doc** + - All text documents including documentation or references +- **/data** + - All raw data and metadata +- **/results** + - Files generated during the analysis including generated data or cleaned data + - Results can be further divided into subdirectories that contain intermediate files and finished files +- **/src** + - All code that was written for the project + +--- +**References** + +Reproducibility: +- [Reproducibility and Research Integrity](https://doi.org/10.1080/08989621.2016.1257387) +- [Reproducibility, Replicability, and Reliability](https://doi.org/10.1162/99608f92.dbfce7f9) + +--- +Commenting: +- [Elena Kosourova](https://towardsdatascience.com/the-art-of-writing-efficient-code-comments-692213ed71b1) +- [Ellen Spertus](https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/) + +--- +Technical Documentation Writing: +- [GitHub README](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes) +- [GitHub Markdown](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) +- [KyuWoo Choi](https://www.freecodecamp.org/news/what-i-learned-from-an-old-github-project-that-won-3-000-stars-in-a-week-628349a5ee14/) +- [Make a README](https://www.makeareadme.com/) +- [Matias Singers](https://github.com/matiassingers/awesome-readme) + +--- +Folder Structure: +- [Rohan Alexander](https://www.tellingstorieswithdata.com/reproducible-workflows.html) +- [Wilson et. al](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec009) diff --git a/slides-resources/inequity_slides.md b/slides-resources/inequity_slides.md new file mode 100644 index 00000000..3f4dd656 --- /dev/null +++ b/slides-resources/inequity_slides.md @@ -0,0 +1,257 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Inequity** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- +##### **Inequality vs. Inequity** +**Inequality:** +- Uneven distribution of resources +- Unbalanced conditions +- Usually quantitative in nature +
+ +**Inequity:** +- Avoidable differences arising from social circumstances +- The state of being unfair or unjust +- Typically qualitative in nature + +--- + + +Inequality usually emerges due to inequity. +
+ +**Can you think of any examples?** + +--- +1. Lower salaries for female employees stems from gender inequity + +2. Job opportunities favouring white applicants stems from racial inequity + +3. Higher rate of Indigenous children in the child welfare system stems from a long history of structural inequity + +--- + + +## `Truth and Reconcilliation` +###### `Missing Children and Unmarked Burials` + +--- + +##### **The History of Residential Schools** +- The Residential Schools System dates back to 1870, which was government-funded and church-led + +- The system's intention was to lead a cultural genocide to assimilate Indigenous children + +- More than 130 schools were estabilshed with more than 150,000 Indigenous students attending + +- Children were forcibly removed from their families + +- Families who resisted faced fines and or jail time + +--- +![bg left cover](pics/Colonization.png) + +- Children did not see their families for years or interact with their family within the schools, unable to speak their language or practice their culture + +--- +![bg right cover](pics/IRSburials.jpeg) +- Children received inadequate food, clothing, facilities, education, staff and medical treatment + +- Children faced severe and constant abuse with mortality rates ranging between 30-60% + +--- + +##### **Oral Histories** +- Using language such as *they* and *them* can create distance from ourselves and those we are speaking or learning about + +- Instead, the second person can help us feel closer to the stories of survivors and their testimonies + +- Oral histories is a significant practise for Indigenous Peoples. It is how knowledge is passed on. To respect the practice and values of Indigenous Peoples, we will engage with oral testimonies from survivors of the Residential School System + +--- + +##### [Rita's Story](https://legacyofhope.ca/wherearethechildren/stories/watcheston/) + +![w:900 center](pics/rita.png) + +--- + +![bg right contain](pics/medicinewheel.png) +##### **Workshop** +Medicine wheels are used by some Indigenous peoples to represents elements of a whole person. + +We'll use this tool as we listen to stories of Survivors and discuss all together. + +--- +**Physical:** +>Possible items can include the physical descriptions of the home setting before Residential School, the settings at school, or any descriptions of locations after school that stand out to students. This can include all healthy forms of affection and/or inappropriate and harmful physical contact. Sports and games played, and events could be included here. Acts of violence and abuse would also go here. + +--- +**Intellectual:** +>Possible items to be placed here include thoughts the students had, reflections and understandings about life before school, the school itself, or after their time in school that they share. Students may also note what Survivors learned in school, what they thought about that learning and other mental activities required by the school. Students may also note its absence. + +--- +**Spiritual:** +>Separating children from their family, customs, languages and traditional ways of being was thought to be the only way to force them into the dominant religions of Canada. Experiences students could place here would be spiritual teachings from before Residential School, during, and after. Students may find that they put a lot into this category when Survivors talk about their return to culture, family and language as part of their healing journey + +--- +**Emotional:** +>There are likely to be many emotional moments in the Indigenous Survivors’ Oral Testimony. Students may struggle with determining whether to put something in this category or another category. Consider physical abuse – because of the nature of the experience, it may seem like it should go in physical; however, because of a strong response of a Survivor, it may seem to belong in the emotional category. Selecting either or both categories are accurate and demonstrates the multi-faceted impacts on Indigenous children + +--- + +![w:1000 center](pics/legacyofhope.png) +[Legacy of Hope](https://legacyofhope.ca/wherearethechildren/stories/) + +--- + + +Let's discuss the stories that we heard and our medicine wheels +
+- What categorical decisions did we make? + +- What was challenging about these decisions? + +- Did you find yourselves noting things you might not otherwise have if you had not been asked specifically for these categories ? + +--- + + + +What are the consquences of this history for Indigenous Peoples today? + +--- +##### **Intergenerational Impacts** +The legacy of Residential Schools have had lasting impacts on Survivors and their families. Some include: +- Alcohol and drug abuse +- Educational blocks +- Higher rates of suicide +- Destruction of social support networks +- Missing and Murdered Indigenous Women and Girls +- Higher rate of children in the child welfare system (an extention of Residential Schools) + +--- +##### **TRC: Calls to Action** +To redress the legacy of colonization and residential schools, the Truth and Reconciliation Commission drafted 94 calls to action in 2012. Since then, only 14 have been completed. + +Reading these call to actions can give us a good idea of some of the inequities that exist today + +--- +>**1.ii.** Providing adequate resources to enable Aboriginal communities and child-welfare organizations to keep Aboriginal families together where it is safe to do so, and to keep children in culturally appropriate environments, regardless of where they reside. + +--- +>**6.** We call upon the federal government to develop with Aboriginal groups a joint strategy to eliminate educational and employment gaps between Aboriginal and non-Aboriginal Canadians. + +--- +>**23.** We call upon all levels of government to:
+**i.** Increase the number of Aboriginal professionals working in the health-care field.
+**ii.** Ensure the retention of Aboriginal health-care providers in Aboriginal communities
+**iii.** Provide cultural competency training for all health-care professionals + +--- +>**30.** We call upon the federal, provincial, and territorial governments to commit to eliminating the overrepresentation of Aboriginal people in custody over the next decade, and to issue detailed annual reports that monitor and evaluate progress in doing so. + +--- + + +How does this relate to data science? + +How do we utilize this knowledge as we practice data science? + +--- +![bg left cover](pics/protest1.png) +It's important as we move forward to understand inequity and the inequality it has produced. +
+It is not enough to discuss crime without discussing the overrepresentation of racialized people in the justice system. + +--- +![bg right cover](pics/protest2.png) +It is not enough to discuss healthcare without discussing the difference of treatment between Indigenous and non-Indigenous people. + +--- +![bg left cover](pics/protest3.png) +It's not enough to discuss the child welfare system without discussing the history of Residential Schools and the impact on Indigenous Peoples. + +--- +**Resources** +- [Activity Workshop](https://secureservercdn.net/198.71.233.37/jjk.2f4.myftpupload.com/wp-content/uploads/2020/02/Let-the-Truth-Be-Told-Guide-2018-V1.44-HR-compressed-1.pdf) +- [Reconciliation Dialogue Workshop](https://reconciliationcanada.ca/staging/wp-content/uploads/2020/02/RDW-Workshop-Booklet_v3final.pdf) +- [TRC Calls to Action](https://www.documentcloud.org/documents/2091412-trc-calls-to-action.html) +- [We Are The Children](https://legacyofhope.ca/wherearethechildren/stories/) + +--- +**Potential Resources:** +- [Reconciliation Dialogue Workshop](https://reconciliationcanada.ca/staging/wp-content/uploads/2020/02/RDW-Workshop-Booklet_v3final.pdf) + - History and intergenerational impacts of residential schools +- [TRC Calls to Action](https://www.documentcloud.org/documents/2091412-trc-calls-to-action.html) + - 94 calls to action to address the legacy of residential schools +- [Historica Canada Video](https://www.youtube.com/watch?v=VFgNI1lfe0A&ab_channel=HistoricaCanada) + - Quick video for the timeline of residential schools +- [Activity Workshop](https://secureservercdn.net/198.71.233.37/jjk.2f4.myftpupload.com/wp-content/uploads/2020/02/Let-the-Truth-Be-Told-Guide-2018-V1.44-HR-compressed-1.pdf) + - Understanding the importance of oral histories + - Discussions of agency, language and allyship + +--- +- [We Are The Children](https://legacyofhope.ca/wherearethechildren/stories/) + - First hand accounts of experiences in residential schools diff --git a/slides-resources/pics/CVCS.png b/slides-resources/pics/CVCS.png new file mode 100644 index 00000000..78aedb8f Binary files /dev/null and b/slides-resources/pics/CVCS.png differ diff --git a/slides-resources/pics/Colonization.png b/slides-resources/pics/Colonization.png new file mode 100644 index 00000000..be86cf64 Binary files /dev/null and b/slides-resources/pics/Colonization.png differ diff --git a/slides-resources/pics/DVCS.png b/slides-resources/pics/DVCS.png new file mode 100644 index 00000000..02d18fac Binary files /dev/null and b/slides-resources/pics/DVCS.png differ diff --git a/slides-resources/pics/IRSburials.jpeg b/slides-resources/pics/IRSburials.jpeg new file mode 100644 index 00000000..d6e3c3da Binary files /dev/null and b/slides-resources/pics/IRSburials.jpeg differ diff --git a/slides-resources/pics/LVC.png b/slides-resources/pics/LVC.png new file mode 100644 index 00000000..98bff71e Binary files /dev/null and b/slides-resources/pics/LVC.png differ diff --git a/slides-resources/pics/NOAA.png b/slides-resources/pics/NOAA.png new file mode 100644 index 00000000..8821dc73 Binary files /dev/null and b/slides-resources/pics/NOAA.png differ diff --git a/slides-resources/pics/blobs.png b/slides-resources/pics/blobs.png new file mode 100644 index 00000000..39cafdd2 Binary files /dev/null and b/slides-resources/pics/blobs.png differ diff --git a/slides-resources/pics/blockquote.png b/slides-resources/pics/blockquote.png new file mode 100644 index 00000000..97c511ea Binary files /dev/null and b/slides-resources/pics/blockquote.png differ diff --git a/slides-resources/pics/bobs-burgers-louise.gif b/slides-resources/pics/bobs-burgers-louise.gif new file mode 100644 index 00000000..b296ab59 Binary files /dev/null and b/slides-resources/pics/bobs-burgers-louise.gif differ diff --git a/slides-resources/pics/census.png b/slides-resources/pics/census.png new file mode 100644 index 00000000..c2070ba3 Binary files /dev/null and b/slides-resources/pics/census.png differ diff --git a/slides-resources/pics/codeblock.png b/slides-resources/pics/codeblock.png new file mode 100644 index 00000000..6c947bd5 Binary files /dev/null and b/slides-resources/pics/codeblock.png differ diff --git a/slides-resources/pics/colabworkflow.png b/slides-resources/pics/colabworkflow.png new file mode 100644 index 00000000..b0fbc16b Binary files /dev/null and b/slides-resources/pics/colabworkflow.png differ diff --git a/slides-resources/pics/colabworkflow2.png b/slides-resources/pics/colabworkflow2.png new file mode 100644 index 00000000..828e0dc5 Binary files /dev/null and b/slides-resources/pics/colabworkflow2.png differ diff --git a/slides-resources/pics/covidcases.png b/slides-resources/pics/covidcases.png new file mode 100644 index 00000000..98a2e206 Binary files /dev/null and b/slides-resources/pics/covidcases.png differ diff --git a/slides-resources/pics/creatingrepo1.png b/slides-resources/pics/creatingrepo1.png new file mode 100644 index 00000000..f0eb8745 Binary files /dev/null and b/slides-resources/pics/creatingrepo1.png differ diff --git a/slides-resources/pics/creatingrepo2.png b/slides-resources/pics/creatingrepo2.png new file mode 100644 index 00000000..f024603c Binary files /dev/null and b/slides-resources/pics/creatingrepo2.png differ diff --git a/slides-resources/pics/creatingrepo3.png b/slides-resources/pics/creatingrepo3.png new file mode 100644 index 00000000..0d4c9cdc Binary files /dev/null and b/slides-resources/pics/creatingrepo3.png differ diff --git a/slides-resources/pics/delete-branch.png b/slides-resources/pics/delete-branch.png new file mode 100644 index 00000000..d854504a Binary files /dev/null and b/slides-resources/pics/delete-branch.png differ diff --git a/slides-resources/pics/developer.png b/slides-resources/pics/developer.png new file mode 100644 index 00000000..1a89e902 Binary files /dev/null and b/slides-resources/pics/developer.png differ diff --git a/slides-resources/pics/difference.png b/slides-resources/pics/difference.png new file mode 100644 index 00000000..b647ec58 Binary files /dev/null and b/slides-resources/pics/difference.png differ diff --git a/slides-resources/pics/drivingtrends.png b/slides-resources/pics/drivingtrends.png new file mode 100644 index 00000000..be52a92a Binary files /dev/null and b/slides-resources/pics/drivingtrends.png differ diff --git a/slides-resources/pics/email.jpeg b/slides-resources/pics/email.jpeg new file mode 100644 index 00000000..acfbf9fa Binary files /dev/null and b/slides-resources/pics/email.jpeg differ diff --git a/slides-resources/pics/error.png b/slides-resources/pics/error.png new file mode 100644 index 00000000..2dffa349 Binary files /dev/null and b/slides-resources/pics/error.png differ diff --git a/slides-resources/pics/error2.png b/slides-resources/pics/error2.png new file mode 100644 index 00000000..8767d6b9 Binary files /dev/null and b/slides-resources/pics/error2.png differ diff --git a/slides-resources/pics/error3.png b/slides-resources/pics/error3.png new file mode 100644 index 00000000..8ad2eb36 Binary files /dev/null and b/slides-resources/pics/error3.png differ diff --git a/slides-resources/pics/error5.png b/slides-resources/pics/error5.png new file mode 100644 index 00000000..2a19285b Binary files /dev/null and b/slides-resources/pics/error5.png differ diff --git a/slides-resources/pics/fork1.png b/slides-resources/pics/fork1.png new file mode 100644 index 00000000..9be2b769 Binary files /dev/null and b/slides-resources/pics/fork1.png differ diff --git a/slides-resources/pics/git_data.png b/slides-resources/pics/git_data.png new file mode 100644 index 00000000..716f6bd3 Binary files /dev/null and b/slides-resources/pics/git_data.png differ diff --git a/slides-resources/pics/gitcollabs1.png b/slides-resources/pics/gitcollabs1.png new file mode 100644 index 00000000..642c286d Binary files /dev/null and b/slides-resources/pics/gitcollabs1.png differ diff --git a/slides-resources/pics/gitcollabs2.png b/slides-resources/pics/gitcollabs2.png new file mode 100644 index 00000000..321b9178 Binary files /dev/null and b/slides-resources/pics/gitcollabs2.png differ diff --git a/slides-resources/pics/gitcollabs3.png b/slides-resources/pics/gitcollabs3.png new file mode 100644 index 00000000..56829059 Binary files /dev/null and b/slides-resources/pics/gitcollabs3.png differ diff --git a/slides-resources/pics/github.png b/slides-resources/pics/github.png new file mode 100644 index 00000000..f81aa24e Binary files /dev/null and b/slides-resources/pics/github.png differ diff --git a/slides-resources/pics/gitignorelarge.png b/slides-resources/pics/gitignorelarge.png new file mode 100644 index 00000000..99ab14e6 Binary files /dev/null and b/slides-resources/pics/gitignorelarge.png differ diff --git a/slides-resources/pics/gitignoresmall.png b/slides-resources/pics/gitignoresmall.png new file mode 100644 index 00000000..c2a549ff Binary files /dev/null and b/slides-resources/pics/gitignoresmall.png differ diff --git a/slides-resources/pics/headings.png b/slides-resources/pics/headings.png new file mode 100644 index 00000000..b786874c Binary files /dev/null and b/slides-resources/pics/headings.png differ diff --git a/slides-resources/pics/help.png b/slides-resources/pics/help.png new file mode 100644 index 00000000..0f7e1453 Binary files /dev/null and b/slides-resources/pics/help.png differ diff --git a/slides-resources/pics/issues.png b/slides-resources/pics/issues.png new file mode 100644 index 00000000..759e63fd Binary files /dev/null and b/slides-resources/pics/issues.png differ diff --git a/slides-resources/pics/issues2.png b/slides-resources/pics/issues2.png new file mode 100644 index 00000000..1d3b27d5 Binary files /dev/null and b/slides-resources/pics/issues2.png differ diff --git a/slides-resources/pics/issues3.png b/slides-resources/pics/issues3.png new file mode 100644 index 00000000..e2aa744f Binary files /dev/null and b/slides-resources/pics/issues3.png differ diff --git a/slides-resources/pics/legacyofhope.png b/slides-resources/pics/legacyofhope.png new file mode 100644 index 00000000..144a9f4e Binary files /dev/null and b/slides-resources/pics/legacyofhope.png differ diff --git a/slides-resources/pics/link.png b/slides-resources/pics/link.png new file mode 100644 index 00000000..f61afe4b Binary files /dev/null and b/slides-resources/pics/link.png differ diff --git a/slides-resources/pics/master.png b/slides-resources/pics/master.png new file mode 100644 index 00000000..e3bb2d68 Binary files /dev/null and b/slides-resources/pics/master.png differ diff --git a/slides-resources/pics/medicinewheel.png b/slides-resources/pics/medicinewheel.png new file mode 100644 index 00000000..263b726f Binary files /dev/null and b/slides-resources/pics/medicinewheel.png differ diff --git a/slides-resources/pics/merge.png b/slides-resources/pics/merge.png new file mode 100644 index 00000000..c6185608 Binary files /dev/null and b/slides-resources/pics/merge.png differ diff --git a/slides-resources/pics/mergeconflicts.png b/slides-resources/pics/mergeconflicts.png new file mode 100644 index 00000000..20f4c002 Binary files /dev/null and b/slides-resources/pics/mergeconflicts.png differ diff --git a/slides-resources/pics/minions.gif b/slides-resources/pics/minions.gif new file mode 100644 index 00000000..7f0cbf29 Binary files /dev/null and b/slides-resources/pics/minions.gif differ diff --git a/slides-resources/pics/ordered.png b/slides-resources/pics/ordered.png new file mode 100644 index 00000000..5b50a4d0 Binary files /dev/null and b/slides-resources/pics/ordered.png differ diff --git a/slides-resources/pics/personalauth.png b/slides-resources/pics/personalauth.png new file mode 100644 index 00000000..801b852c Binary files /dev/null and b/slides-resources/pics/personalauth.png differ diff --git a/slides-resources/pics/protest1.png b/slides-resources/pics/protest1.png new file mode 100644 index 00000000..e46c40ed Binary files /dev/null and b/slides-resources/pics/protest1.png differ diff --git a/slides-resources/pics/protest2.png b/slides-resources/pics/protest2.png new file mode 100644 index 00000000..e527b17c Binary files /dev/null and b/slides-resources/pics/protest2.png differ diff --git a/slides-resources/pics/protest3.png b/slides-resources/pics/protest3.png new file mode 100644 index 00000000..45b967e0 Binary files /dev/null and b/slides-resources/pics/protest3.png differ diff --git a/slides-resources/pics/pullrequest1.png b/slides-resources/pics/pullrequest1.png new file mode 100644 index 00000000..41f1703f Binary files /dev/null and b/slides-resources/pics/pullrequest1.png differ diff --git a/slides-resources/pics/pullrequest2.png b/slides-resources/pics/pullrequest2.png new file mode 100644 index 00000000..0dfc699b Binary files /dev/null and b/slides-resources/pics/pullrequest2.png differ diff --git a/slides-resources/pics/pullrequest3.png b/slides-resources/pics/pullrequest3.png new file mode 100644 index 00000000..e77d8ef7 Binary files /dev/null and b/slides-resources/pics/pullrequest3.png differ diff --git a/slides-resources/pics/pullrequest4.png b/slides-resources/pics/pullrequest4.png new file mode 100644 index 00000000..3ade7d8d Binary files /dev/null and b/slides-resources/pics/pullrequest4.png differ diff --git a/slides-resources/pics/pullrequest5.png b/slides-resources/pics/pullrequest5.png new file mode 100644 index 00000000..ced584e1 Binary files /dev/null and b/slides-resources/pics/pullrequest5.png differ diff --git a/slides-resources/pics/pullrequestmergeconflict.png b/slides-resources/pics/pullrequestmergeconflict.png new file mode 100644 index 00000000..a18d33c4 Binary files /dev/null and b/slides-resources/pics/pullrequestmergeconflict.png differ diff --git a/slides-resources/pics/pullrequestmergeconflict2.png b/slides-resources/pics/pullrequestmergeconflict2.png new file mode 100644 index 00000000..55bc3987 Binary files /dev/null and b/slides-resources/pics/pullrequestmergeconflict2.png differ diff --git a/slides-resources/pics/pullrequestmergeconflict3.png b/slides-resources/pics/pullrequestmergeconflict3.png new file mode 100644 index 00000000..a1f5cd67 Binary files /dev/null and b/slides-resources/pics/pullrequestmergeconflict3.png differ diff --git a/slides-resources/pics/pullrequestmergeconflict4.png b/slides-resources/pics/pullrequestmergeconflict4.png new file mode 100644 index 00000000..16722653 Binary files /dev/null and b/slides-resources/pics/pullrequestmergeconflict4.png differ diff --git a/slides-resources/pics/rebase1.png b/slides-resources/pics/rebase1.png new file mode 100644 index 00000000..6ce06950 Binary files /dev/null and b/slides-resources/pics/rebase1.png differ diff --git a/slides-resources/pics/rebase2.png b/slides-resources/pics/rebase2.png new file mode 100644 index 00000000..e79818d2 Binary files /dev/null and b/slides-resources/pics/rebase2.png differ diff --git a/slides-resources/pics/rhelp.png b/slides-resources/pics/rhelp.png new file mode 100644 index 00000000..6a2abad5 Binary files /dev/null and b/slides-resources/pics/rhelp.png differ diff --git a/slides-resources/pics/rhelp2.png b/slides-resources/pics/rhelp2.png new file mode 100644 index 00000000..6899eced Binary files /dev/null and b/slides-resources/pics/rhelp2.png differ diff --git a/slides-resources/pics/rita.png b/slides-resources/pics/rita.png new file mode 100644 index 00000000..f77eb198 Binary files /dev/null and b/slides-resources/pics/rita.png differ diff --git a/slides-resources/pics/settings.png b/slides-resources/pics/settings.png new file mode 100644 index 00000000..288009dd Binary files /dev/null and b/slides-resources/pics/settings.png differ diff --git a/slides-resources/pics/stackoverflow1.png b/slides-resources/pics/stackoverflow1.png new file mode 100644 index 00000000..a0f81d3b Binary files /dev/null and b/slides-resources/pics/stackoverflow1.png differ diff --git a/slides-resources/pics/stackoverflow2.png b/slides-resources/pics/stackoverflow2.png new file mode 100644 index 00000000..bd802c07 Binary files /dev/null and b/slides-resources/pics/stackoverflow2.png differ diff --git a/slides-resources/pics/stackoverflow3.png b/slides-resources/pics/stackoverflow3.png new file mode 100644 index 00000000..e8c0966d Binary files /dev/null and b/slides-resources/pics/stackoverflow3.png differ diff --git a/slides-resources/pics/testing-commit.png b/slides-resources/pics/testing-commit.png new file mode 100644 index 00000000..eeaac5fd Binary files /dev/null and b/slides-resources/pics/testing-commit.png differ diff --git a/slides-resources/pics/testing-head.png b/slides-resources/pics/testing-head.png new file mode 100644 index 00000000..0483b49a Binary files /dev/null and b/slides-resources/pics/testing-head.png differ diff --git a/slides-resources/pics/testing.png b/slides-resources/pics/testing.png new file mode 100644 index 00000000..764c23f6 Binary files /dev/null and b/slides-resources/pics/testing.png differ diff --git a/slides-resources/pics/text-styling.png b/slides-resources/pics/text-styling.png new file mode 100644 index 00000000..0fe65a47 Binary files /dev/null and b/slides-resources/pics/text-styling.png differ diff --git a/slides-resources/pics/topics.png b/slides-resources/pics/topics.png new file mode 100644 index 00000000..730e3e9c Binary files /dev/null and b/slides-resources/pics/topics.png differ diff --git a/slides-resources/pics/topics2.png b/slides-resources/pics/topics2.png new file mode 100644 index 00000000..18ef3756 Binary files /dev/null and b/slides-resources/pics/topics2.png differ diff --git a/slides-resources/pics/topics3.png b/slides-resources/pics/topics3.png new file mode 100644 index 00000000..316f586e Binary files /dev/null and b/slides-resources/pics/topics3.png differ diff --git a/slides-resources/pics/torontocrime.png b/slides-resources/pics/torontocrime.png new file mode 100644 index 00000000..b6feb0b8 Binary files /dev/null and b/slides-resources/pics/torontocrime.png differ diff --git a/slides-resources/pics/unordered.png b/slides-resources/pics/unordered.png new file mode 100644 index 00000000..3ec7a265 Binary files /dev/null and b/slides-resources/pics/unordered.png differ diff --git a/slides-resources/pics/workflow.png b/slides-resources/pics/workflow.png new file mode 100644 index 00000000..b1c70199 Binary files /dev/null and b/slides-resources/pics/workflow.png differ diff --git a/slides-resources/pics/workflow1.png b/slides-resources/pics/workflow1.png new file mode 100644 index 00000000..80d647b0 Binary files /dev/null and b/slides-resources/pics/workflow1.png differ diff --git a/slides-resources/pics/workflow2.png b/slides-resources/pics/workflow2.png new file mode 100644 index 00000000..971e8461 Binary files /dev/null and b/slides-resources/pics/workflow2.png differ diff --git a/slides-resources/pics/workflow3.png b/slides-resources/pics/workflow3.png new file mode 100644 index 00000000..45bea596 Binary files /dev/null and b/slides-resources/pics/workflow3.png differ diff --git a/slides-resources/pics/workflow4.png b/slides-resources/pics/workflow4.png new file mode 100644 index 00000000..7ed0d9fd Binary files /dev/null and b/slides-resources/pics/workflow4.png differ diff --git a/slides-resources/pics/workflow5.png b/slides-resources/pics/workflow5.png new file mode 100644 index 00000000..8d48f615 Binary files /dev/null and b/slides-resources/pics/workflow5.png differ diff --git a/slides-resources/pics/workflow6.png b/slides-resources/pics/workflow6.png new file mode 100644 index 00000000..60f74015 Binary files /dev/null and b/slides-resources/pics/workflow6.png differ diff --git a/slides-resources/problemsolving_slides.md b/slides-resources/problemsolving_slides.md new file mode 100644 index 00000000..716f355d --- /dev/null +++ b/slides-resources/problemsolving_slides.md @@ -0,0 +1,263 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Problem Solving** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + + +--- + + +## `Identifying where` +## `the error is` + +--- +Our code can and will break often. Even experienced coders will have their code break several times throughout the day. + +We need to first identify the error so we know what needs fixing + +--- + + +What are some of they ways you might identify where an error is in your code? + +--- +##### **Change the Code** +- The first step to fixing your code is identifying where the error or problem is. One of the best ways to do this is **by systematically changing one thing at a time**. + +- Changing one thing allows us to see if the program works, breaks in the same way or breaks in a new way + +- Changing code can be done by removing code until it doesn't break. We can delete the code, or we can comment it out (`#`) to save our work. + +--- +##### **Reading the Error Messages** +- Sometimes the error messages can quickly identify our problems, both the location and reason! +
+ +![w:1100 center](pics/error.png) + +--- +![w:1100 center](pics/error2.png) + +--- +##### **Help** +- Programming languages often come with a help feature that give us more information about a package or command. +
+ +![w:600 center](pics/help.png) + +--- +![w:300](pics/rhelp.png)     +![w:560 center](pics/rhelp2.png) + +--- + + +## `Searching for` +## `common errors` + +--- +It's helpful to understand some of the most common errors so that we can methodically work through our code and find our errors. + +Sometimes the reason why our code is breaking is due to simple fixes such as missing parentheses. + +--- +##### **Common Errors** +Many times our errors are simple fixes: +1. Forgetting a `)` or having too many `)` + +2. Not defining a variable before calling it + +3. Our indents don't line up +```python +for i in df['column']: + if i == 100: + print(i) +``` + +--- +4. We're attempting to apply a function to a conflicting class +```python +for i in df['number_column']: + if i == "100": + print(i) +``` +5. An object is not iterable +```python +a = 1 +list(a) +``` +```python +a = 1 +dict(a) +``` + +--- +Some of the most common Python Errors are: + +1. SyntaxError: invalid syntax + +2. NameError: name name is not defined +3. SyntaxError: unexpected EOF while parsing +4. IndentationError: unindent does not match any outer indentation level + +--- +##### **1. SyntaxError: invalid syntax** +![w:1100 center](pics/error.png) + +--- +##### **2. NameError: name name is not defined** +![w:1100 center](pics/error2.png) + +--- +##### **3. SyntaxError: unexpected EOF while parsing** +![w:1150 center](pics/error3.png) + +--- +##### **4. IndentationError: unindent does not match any outer indentation level** +![w:1150 center](pics/error5.png) + +--- + + +## `Google and StackOverflow` + +--- +Learning how to search for our errors is probably one of the most valuable tools we have. Two of the best resources we have are *Google* and *StackOverflow*. + +Google: A broad search across different resources + +StackOverflow: Programming specific question and answer site + +--- +![w:1000 center](pics/stackoverflow1.png) + +--- +![w:1000 center](pics/stackoverflow2.png) + +--- +![w:1000 center](pics/stackoverflow3.png) + +--- +- When searching on Google or StackOverflow, add *python* or *R* to your search to narrow the results + +- We can also search package specific questions such as *seaborn* or *pandas* + + +--- +- We can search our error messages or what we're actually trying to do (ex. "Convert string to date python") + +- The more we code and search, the more we'll learn specific language that will help us search (ex. "Group by multiple columns") + +--- + + +## `Reproducible Examples` + +--- +##### **Reprex** +- A reprex is a **REPR**oducible **EX**ample. + +- It contains just enough of the code to reproduce the error, ie. it is **self-contained** + +- We might have to create a smaller version of the code in order to create the reprex. Don't include anything that isn't related to the problem. + +- Sometimes, this process will help us solve our issue. + +--- +##### **Inclusions** +A minimal dataset to demonstrate the problem. This could be a regularly used one such as *iris* +```python +from sklearn import datsets +iris = datasets.load_iris() +``` + +or one easily built yourself. +```python +d = {'col1': [1, 2], 'col2': [3, 4]} +df = pd.DataFrame(data=d) +df +``` + +--- +- Make sure to include classes that are necessary to your reprex (ex. dates, factors, etc.) + +- If you're using randomly sampled data, set the seed to so the same data is produced each time. +```python +df['column'].sample(n=3, random_state=1) +``` + +--- +Include all packages that you need. +
+ +- Make sure they are placed at the top of the script so it's quick and easy to see what is necessary for the reprex. + +--- +##### **Other Inclusions** +- Details about the issues you are facing. + +- Comments that will add clarification to your error. + +- Add what fixes have been attempted. This could include pages to StackOverflow articles that you've viewed. + +- Communicate cleary what you're desired outcome is. + +--- +**References:** +- [StackOverflow, How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) +- [Wickham, Advanced R](http://adv-r.had.co.nz/Reproducibility.html) +- [Wilson, Debugging](https://buildtogether.tech/debugging/) diff --git a/slides-resources/professional_slides.md b/slides-resources/professional_slides.md new file mode 100644 index 00000000..9c5fbde6 --- /dev/null +++ b/slides-resources/professional_slides.md @@ -0,0 +1,447 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Professional Skills** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- +##### **Introduction** +At a certain point, we'll be entering the professional world. We'll learn a few tactics and skills to help us succeed in this environment. + +If we're already in this professional world, we'll reintroduce or reinforce practises that will us an advantage. + +--- + +# **Important Stuff** + +--- + + +## `Crunch Mode` + +--- + + +Who has worked wayyy too many hours in a day/week? + +Did you feel like you were productive? + +--- +A scientific study looking at overwork found two main points: +> 1. Working more than eight hours a day for more than a couple of weeks of time lowers your total productivity, not just your hourly productivity—i.e., you get less done in total (not just per hour) when you're in crunch mode than you do when you work regular hours. + +--- +> 2. Working over 21 hours in a stretch increases the odds of you making a catastrophic error just as much as being legally drunk. + +While Henry Ford was originally criticized for adopting a 40 hour work week in 1926, we're going through a similar challenge of adopting a 4 day work week. + +--- +We slowly decrease our effectiveness the more tired we are yet we don't always realize that our ability is declining. + +Working longer, doesn't necessarily mean we're doing better work. In fact, we run the risk of making mistakes that could cause many headaches for those we're collaborating with. + +Unfortunately, + +**Produce good work by getting good rest. Take care of your mental health and you'll make better decisions** + +--- + + +## `Time Management` + +--- + + +What are some of your methods for time management? + +--- +Greg Wilson outlines 7 time managment methods. +1. Make a list of the things you have to do. + +2. Weed out everything that you don't need to do right away. +3. Sort the list so that the most important tasks are at the top. +4. Make sure you have everything you need to see the first task through. +5. Turn off interruptions. +6. Set an alarm to go off in fifty minutes. +7. Take a ten-minute break. + +--- +Break down bigger tasks (Wilson suggests taks that are more than an hour) into smaller pieces. + +One task could even be, *"Plan To Dos"* + +Getting back into a productive flow after an interruption can take anywhere between several seconds and several minutes. This adds up! + +**Remember, not everyone works the same. It's important to always adopt what works best for you!** + +--- +> If you are neurotypical and have neurodivergent teammates, ask them what works well for them rather than ignoring the difference or guessing what they might want. Please do the same if you have teammates who have difficulty seeing, hearing, or moving about: they have expertise you don't. + +--- + + +## `Meetings` + +--- + + +What about meetings do you find frustrating? + +--- +![center](pics/email.jpeg) + +--- +##### **1. Agree on the rules** +- Everyone should agree on what the rules are of a meeting + +- For example, you could require everyone to show up on time and and have everyone's voices be heard equally + +--- +##### **2. Keep discussion meetings and decision meetings separate.** +- Discussion Meetings: + - Explore design alternatives or next term's goals + - Wide-ranging +
+- Decision Meetings: + - Choosing which alternatives to persue + - Short and focused + +--- +##### **3. Decide if there actually needs to be a meeting** +- Sharing information can be kept to an email + +- Meetings are great for discussion, deicisons and brainstorming + +--- +##### **4. Write an agenda** +- An agenda will show the importance of what needs to be discussed +
+ +##### **5. Include timings in the agenda** +- This will help prevent time taken from other important things that needs to be discussed + +- Figure out why a meeting ran over time to better prepare for the next one + +--- +##### **6. Prioritize** +- First, tackle issues with high impact but take little time + +- Save things that will take more time and are less impactful for the end of the meeting + +--- +##### **7. Make one person responsible for keeping things moving.** +- Deligate a moderator to keep track of time, making sure people are not distracting, and making sure some don't take up too much air time + +- Historically, certain people have been heard more than others. Make sure that everyone in the room feels comfortable participating and everyone's voices are heard. + +--- +##### **8. Require politeness** +- No one can be rude + +- No one gets to ramble + +- Off topic conversations can be discussed elsewhere + +--- +> People often reach for positive statements like "assume good intent" because they're worried about people being "shamed" over innocent mistakes. But society at large is already inclined to assume good intent in people with power and privilege–even when they're not demonstrating it. If you want to build a culture of "assuming good intent," start by assuming good intent in marginalized people. Assume that they already tried being nice. Assume that their feelings are valid. Assume that, after a lifetime of practice, they are responding to harmful behavior in the way that is safest for them. Prioritize that safety over the momentary discomfort people feel when they realize they've done something hurtful. + +--- +##### **9. No interruptions** +- If someone wants to speak, participants make it known non-verbally (to not interrupt) + +- The moderator can keep track of who wants to speak and give them time + +--- +##### **10. No distractions** +- Side conversations, texting, checking emails, etc can all indicate that they don't think what the speaker has to say or their work is important. + +--- +##### **11. Take minutes** +- Take point-form notes about the most important things that were shared during the meeting + +--- +##### **12. End early** +- Ending early gives people a break before they have to get back to their work +
+ +--- +##### **Post Meeting** +- At the end of the meeting, make the notes accessible so those that were not at the meeting can still stay informed. + +- Written notes gives everyone a chance to correct mistakes, misinterpretations or misrepresentations + +- Make sure questions and action items are followed up to help with the next meeting + +--- + +![w:400 center](pics/NOAA.png) + +--- + + + +## `Air Time` + +--- +Some people naturally will speak more than others. It's important to make sure that everyone has a chance to speak and that no one takes more space than others. + +--- +##### **Three Sticky Notes** +- Each participant gets three sticky notes (or paper clips, ripped pieces of paper, anything that works) + +- Each time someone speaks, they give up a sticky note + +- Once they're out, they can't speak for the rest of the meeting + +--- +##### **Online Meetings** +- Keep track of people who want to speak by using the chat + +- Although we may want everyone's camera on, remember that working from home provides other challenges. Some may not have the space to have their camera on and that should be respected. + +--- + + +## `Making Decisions` + +--- +To make decisions, it's important to acknowledge the power structure of the team. +
+1. Formal (accountable) + - Larger groups need governance to make decisions + +
+ +2. Informal (unaccountable) + - Small groups (less than 6) + +--- + +##### **Martha's Rules** + +1. Before each meeting, anyone who wishes may sponsor a proposal. Proposals must be filed at least 24 hours before a meeting in order to be considered at that meeting, and must include: + - a one-line summary + - the full text of the proposal + - any required background information + - pros and cons + - possible alternatives + +--- +2. A quorum is established in a meeting if half or more of voting members are present. + +3. Once a person has sponsored a proposal, they are responsible for it. The group may not discuss or vote on the issue unless the sponsor or their delegate is present. The sponsor is also responsible for presenting the item to the group. + +--- +4. After the sponsor presents the proposal a sense vote is cast for the proposal prior to any discussion: + - Who likes the proposal? + - Who can live with the proposal? + - Who is uncomfortable with the proposal? +
+ +5. If all of the group likes or can live with the proposal, it passes with no further discussion. + +--- +6. If most of the group is uncomfortable with the proposal, it is sent back to its sponsor for further work. (The sponsor may decide to drop it if it's clear that the majority isn't going to support it.) + +--- +7. If some members are uncomfortable with the proposal, a timer is set for a brief discussion moderated by the meeting moderator. After 10 minutes or when no one has anything further to add, the moderator calls for a straight yes-or-no vote on the question: "Should we implement this decision over the stated objections?" If a majority votes "yes" the proposal is implemented. Otherwise, it is returned to the sponsor for further work. + +--- + + +# **Using Git Together** + +--- + + +## `Code Reviews` + +--- +Code Reviews are a cost-effective way to find bugs in the software. It also allows us to share knowledge between team members by reading other's code. + +There are many guidelines documenting how to do good code reviews. Wilson references the SmartBear guide that we'll reiterate here. + +--- +##### **1. Have the instructor do a demonstration review** +- It's helpful to have a demonstration on how to do a code review to understand how much detail is expected + +- Demonstrations can include sample code and a think aloud commenting + +--- +##### **2. Authors should clean up code before review** +- To make code easier to review, we should clean our code (ex. variable names) and add comments to improve readability + +- It's also possible that errors will be found by the author during this clean up + +--- +##### **3. Review at most 200 lines of a code at a time** +- A rule of thumb is that a Pull Request should be no longer than 200 lines, therefore, review should be no longer than 200 to begin with + +- Code reviews can be up to 400 lines but start small when beginning with code reviews + +--- +##### **4. Use checklists** +- Checklists include the most common problems to look out for. To begin, ask for no more than 12 otherwise it could be overwhelming. + +- You can also keep a list of errors that keep occuring to keep in mind as you're coding. + +--- +##### **5. Offer alternatives** +- Help fix code rather than just what is wrong +
+ +##### **6. Don't feign surprise or pass judgment** +- It just doesn't feel good to be judged or made to feel incompetent. + +--- +##### **7. Don't overwhelm people with details** +- If you see an error throughout the code (ex. variable `x` is overused), don't comment ever time you see it. Comment the first couple instances + +--- +##### **8. Don't try to sneak in feature requests** +- Code reviews should stick to bugs rather than introducing new functionalities + +--- +##### **9. Follow up** +- Suggestions don't always have to be accepted but they should be rejected with good reason. + +- GitHub allows for discussion threads for each comment. + - Reviewers can look at these to make sure the suggestion was acknowledged/addressed + +--- +##### **10. Don't tolerate rudeness** +- Stand up for those who are the victims of rudeness, giving the offender the opportunity to adjust their attitude and behaviour. + +--- +##### **11. Be specific in replies to reviewers** +- Easy suggestions, such as variable changes, can easily be fixed +- For more difficult suggestions, replies to the reviewer should be detailed rather than ignored + +--- +##### **12. Thank your reviewers** +- Acknowledge those who have helped to improve your code. +
+ +Now let's take a look at a code review [example](https://buildtogether.tech/git-team/#code-reviews). + +--- + +# **The Process** + +--- + + +## `Agile` + +--- +Agile is a bottom up approach to project management and software development that focuses on iterations and frequent feedback. + +It can also be extremely useful in a variety of fields that have small teams and constantly changing requirements - like student projects. + +--- +##### **Best for...** +- Constantly changing requirements rather than long-range planning +- Continuous communication with developers and users +- Small teams +- Daily progress + +--- +##### **Scrum** +- Scrum is an Agile methodology that breaks down development into short development cycles called *sprints*. + +- Sprints are typically 2 weeks long, sometimes longer or shorter. + +- Each sprint includes deciding what to build, desiging it, building it, testing it and delivering it. + +--- +##### **Stand-up Meetings** +- Stand-up meetings are a chance for everyone on the team to report what they accomplished the day before, what they're planning to work on and what's blocking them. + +- Meetings are usually at the beginning of each day. + +- They give team members a chance to stay focused, get feedback and stay on track to meet the iteration's goals. + +--- + +##### **Example** +**Yesterday:** Fixed the bug that was making the message file reader crash on accented characters, and added code to the HTML producer to display accented characters properly. + +**Today:** Will get the message file reader to recognize links to images and load those images. + +**Blockers:** What should the message file reader do if the image can't be found? Should it link to the ones it has, halt with an error message, or something else? + +--- + + +What do you notice about this example? + +--- + +- Make each task no more than a day long, otherwise you could say you're working on the same thing for several days in a row. + +- Small tasks also allow for feedback and redirection if needed. + +--- +**References:** +- [NOAA](https://coast.noaa.gov/ddb/) +- [Wilson, Process](https://buildtogether.tech/process/#agile) +- [Wilson, The Important Stuff](https://buildtogether.tech/important/) +- [Wilson, Using Git Together](https://buildtogether.tech/git-team/#code-reviews) \ No newline at end of file diff --git a/slides-resources/reproducibility_slides.md b/slides-resources/reproducibility_slides.md new file mode 100644 index 00000000..ee99519c --- /dev/null +++ b/slides-resources/reproducibility_slides.md @@ -0,0 +1,474 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +style: | + img[alt~="center"] { + display: block; + margin: 0 auto; + } + +--- + + + + +# **Reproducibility** +```bash +$ echo "Data Sciences Institute" +$ echo "Rachael Lam" +``` + +--- + + +What is reproducibility? + +--- +- Reproducibility is the ability for for independent researches to obtain the same or similar results when repeating an experiment or test. + +- This concept has been widely used in natural sciences, but is not yet as popular in data science. + +- Remember, data science is a science. We question, hypothesize, test, and therefore, we should also have the same rigour of confirmation. + +--- +- Skepticism should always be able to be independently verified. We should be able to defend our results and decisions. + +- Who would believe your results otherwise? More importantly, you should not believe results if they cannot be verified. + +--- + + +Why is reproducibility important? + +--- +1. New Insights + +2. Reduce Error Risks + +3. Validate Results + +4. Transparency + +--- + + +How can we make our work reproducible? + +--- +There are a number of practices that can help make our work reproducible including: +- Commenting Code +- Technical Documentation +- Folder Structure + +--- + + +## `Commenting Code` + +--- + + +How does commenting code help in reproducibility? + +--- +Commenting code is an important practice that benefits both ourselves and collaborators. + +Not only can we understand what we did to fix our own errors or improve our work, but others can better understand our code to reproduce it. + +--- +[Ellen Spertus](https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/) outlines 9 rules to follow: +
+ +1. Comments should not duplicate the code + +2. Good comments do not excuse unclear code +3. If you can’t write a clear comment, there may be a problem with the code +4. Comments should dispel confusion, not cause it + +--- +5. Explain unidiomatic code in comments + +6. Provide links to the original source of copied code +7. Include links to external references where they will be most helpful +8. Add comments when fixing bugs +9. Use comments to mark incomplete implementations + +--- +##### **1. Comments should not duplicate the code** +- Comments should add value to whoever is reading your code. +- Duplicating code adds unneccesary bulk and can actually make it more difficult to understand the code. +
+ +**Can you think of a bad example?** + +--- +Here is an example of what you should **not** do: +```bash +x=5 + +if [ $x = 5 ]; then + echo "x equals 5." # if x = 5 then ouput x equals 5 + +else + echo "x does not equal 5." # otherwise output x does not equal 5 + +fi +``` + +--- +##### **2. Good comments do not excuse unclear code** +- Our aim should always be having clear code, rather than relying on our comments to add clarity. +- Remember, we should not be adding more bulk to the code that makes it more difficult to understand. + +```bash +Add example here +``` + +--- +##### **3. If you can’t write a clear comment, there may be a problem with the code** +>Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to +debug it. + +\- Kernighan's Law + +--- +##### **4. Comments should dispel confusion, not cause it** +- If our comments are adding further confusion, we should either rewrite the comment or remove it entirely. +- A could comment should always be written with the intent to help better understand what is being done. + +--- +##### **5. Explain unidiomatic code in comments** +- If we've purposefully written code that others may find unecessary, we need to comment our reasoning. +- Others may try to simplify our code if we don't explain our reasoning. +
+ +**Can you think of an example?** + +--- +##### **6. Provide links to the original source of copied code** +- Often times, we'll use code that others have written. It's important to give credit to the original source, but as well as give us a reminder as to where we got the code to reference it later if we need. +- Referencing the source can also provide other information such as what the problem was, why the solution was recommended and how it can be improved. It also means, we don't have to comment all of these details again in our own code. + +--- +An example: +```bash +# I got these 9 rules from Ellen Spertus' blog post on +# StackOverflow: https://stackoverflow.blog/2021/12/23/ +# best-practices-for-writing-code-comments/ +``` +- It's best to include the URL so other's don't have to search for the exact location. +- Remember: **never** copy code that you don't personally understand. +- Code from StackOverflow falls under Create Commons licenses so a reference comment is needed. + +--- +##### **7. Include links to external references where they will be most helpful** +- References don't just have to be used for copied code. They can also provide information on decisions made or changes in practices + +--- +##### **8. Add comments when fixing bugs** +- Comments can help others understand what we modified, if the modification is still needed, and how to test our modifications +- Although `git blame` can be used to find the commit that modified the code, a good comment can help locate the change and are quite brief. + +--- +##### **9. Use comments to mark incomplete implementations** +- Sometimes we have limitations in our knowledge or time. Adding code documenting these limitations can allow us to better address and fix the issues. + +--- +##### **Some other good practices:** +- Comments should be clear and efficient. Don't add more information than necessary, but don't be too vague +- Remember to update your comments if you update your code. Old comments can add more confusion. +- Inline comments can add noise as they're mixed with our code. Spacing can be helpful here: + +```python +colors = [[213/255,94/255,0], # vermillion + [86/255,180/255,233/255], # sky blue + [230/255,159/255,0], # orange + [204/255,121/255,167/255]] # reddish purple +``` + +--- +>Code tells you how, comments tell you why. + +\- Jeff Atwood, Co-founder of StackOverflow + +--- + + +## `Technical Documentation` +## `Writing` + +--- + + +What is technical documentation writing? + +--- + + +Why is it important to write a good technical documentation? + +--- +Technical documents are necessary for reproducibility as they relay important information about your project to others. Writing technical documents is not easy but should not be overlooked. + +A well done technical document will communicate the goals of a project and in doing so, can generate interest in the project. + +--- +GitHub outlines several pieces of information to include: +1. What the project does +2. Why the project is useful +3. How users can get started with the project +4. Where users can get help with the project +5. Who maintains and contributes to the project +
+ +This is just part of the story and we'll add more to this in the coming slides. + +--- +##### **README** +- Technical documentation writing is typically found in a `README.md` file. +- If the `README.md` file is placed in our repo's root, `doc` folder, or hidden in the `.github` directory, GitHub will place the contents of the `README.md` on the main repo page. +- The `README.md` file will be the first thing visitors see when they come to the project page so it's important to make it as appealing as possible. + +--- +##### **Examples** +Let's walk through some good examples of `README.md` files: +- [Create Go App CLI](https://github.com/create-go-app/cli#readme) +- [Human Activity Recognition](https://github.com/ma-shamshiri/Human-Activity-Recognition#readme) +- [Markdownify](https://github.com/amitmerchant1990/electron-markdownify#readme) +- [More!](https://github.com/matiassingers/awesome-readme) + +--- + + +What did you like about these README files? + +What similarities can you see? + +--- +##### **What should be included?** +1. Name of the project +2. What the project does +3. The project's usages +4. How to get started +5. Where to find help +6. Who contributes + +--- +##### **1. Name of the Project** +- The name of your project should be unambiguous. + +--- +##### **2. What the project does** +- This should be a description of the project. +- Provide context to the project and any reference links. +- Include features or background information +- *Can be titled "Description"* + +--- +##### **3. The project's usages** +- This should include how the project can be used. +- Provide examples using the code along with the expected output of said code. +- It should be a smaller example. Longer examples can be linked to. +- *Can be titled "Usages"* + +--- +##### **4. How to get started** +- This is the installation guide. +- Think of your particular audience and how much detail you might need to include. +- Add a requirements section if there are specific dependencies or needs to run in a particular programming language. +- *Can be titled "Installation"* + +--- +##### **5. Where to find help** +- Direct people on where to find help if they need. +- This could be the issues page on GitHub, a forum, or an email address. +- *Can be titled "Support"* + +--- +##### **6. Who contributes** +- This should outline how others can contribute to your project and what your requirements are for accepting contributions. +- *Can be titled "Contributing"* + +--- +##### **Additional Additions** +- **Visuals:** Visuals can grab people's attention, but they can also be helpful for showcasing what the code does. Include screenshots or GIFs that demonstrate your project. +- **Badges:** Badges provide metadata such as issue tracking, test results and downloads. [Shields.io](https://shields.io/) provides this service and you can also look at their [GitHub](https://github.com/badges/shields) for more information. +- **Acknowledgements:** Include the authors or anyone that helped with the project. + +--- +##### **Markdown** +- As noted by the extension, `README.md` files are usually written in markdown, thus using markdown syntax for styling. +- [GitHub](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) provides a good reference on how to write your README in markdown. + +--- +##### **Headings** +```markdown +# Largest Heading +## Second Largest Heading +### Third Largest Heading +``` +![w:1000 center](pics/headings.png) + +--- +##### **Text Styling** +```markdown +**bold** +*italic* +~~strikethrough~~ +**this is a *nested* example** +***bold and italic*** +``` +![w:1000 center](pics/text-styling.png) + +--- +##### **Quoting** +```markdown +> Block quote some text +``` +![w:1000 center](pics/blockquote.png) + +--- +##### **Unordered Lists** +```markdown +- this is an unordered list +- second item + - nested + - second nest +``` +![w:1000 center](pics/unordered.png) + +--- +##### **Ordered Lists** +```markdown +1. This is an ordered list +2. This is the second item + - with some additional information +3. This is the third +``` +![W:1000 center](pics/ordered.png) + +--- +##### **Codeblock** +Wrap your code in ``` to create a codeblock. + +![W:1000 center](pics/codeblock.png) + +--- +##### **Links** +```markdown +[Rachael's GitHub](https://github.com/rachaellam) +``` +![W:1000 center](pics/link.png) + +--- +##### **Images** +```markdown +![w:1000 center](pics/picture.png) +``` +![w:500 center](pics/bobs-burgers-louise.gif) +As we see, images can also be GIFs. We can also play around with the size and alignment. + +--- + + +## `Folder Structure` + +--- + + +What is folder structure and why is important? + +--- +A good folder structure is important for reproducibility because it easily allows for others to navigate and implement our projects. If someone references a file that is self contained, they know they won't have to change the file path to gain access. + +For example, what is the difference between these two paths: + +1. `"/Users/rachaellam/Documents/all-projects/this-project/data/"` + +2. `"this-project/data/"` + +--- +Folder structure can vary based on the project but a basic one to follow is... +- **/inputs** + - Everything that will not be edited including raw data and references +- **/outputs** + - Everything that was created during the project and your results +- **/scripts** + - All code that was written for the project + +--- +[Wilson et. al](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec009) also outline a file structure that is similar... +- **/doc** + - All text documents including documentation or references +- **/data** + - All raw data and metadata +- **/results** + - Files generated during the analysis including generated data or cleaned data + - Results can be further divided into subdirectories that contain intermediate files and finished files +- **/src** + - All code that was written for the project + +--- +**References** + +Reproducibility: +- [Reproducibility and Research Integrity](https://doi.org/10.1080/08989621.2016.1257387) +- [Reproducibility, Replicability, and Reliability](https://doi.org/10.1162/99608f92.dbfce7f9) + +--- +Commenting: +- [Elena Kosourova](https://towardsdatascience.com/the-art-of-writing-efficient-code-comments-692213ed71b1) +- [Ellen Spertus](https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/) + +--- +Technical Documentation Writing: +- [GitHub README](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes) +- [GitHub Markdown](https://docs.github.com/en/github/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) +- [KyuWoo Choi](https://www.freecodecamp.org/news/what-i-learned-from-an-old-github-project-that-won-3-000-stars-in-a-week-628349a5ee14/) +- [Make a README](https://www.makeareadme.com/) +- [Matias Singers](https://github.com/matiassingers/awesome-readme) + +--- +Folder Structure: +- [Rohan Alexander](https://www.tellingstorieswithdata.com/reproducible-workflows.html) +- [Wilson et. al](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec009) + diff --git a/slides-resources/unix_slides.md b/slides-resources/unix_slides.md new file mode 100644 index 00000000..f3fc9afc --- /dev/null +++ b/slides-resources/unix_slides.md @@ -0,0 +1,2046 @@ +--- +marp: true +theme: uncover +_class: invert +paginate: true + +--- + + + +# **Unix Shell** +```console +$ echo "Data Sciences Institute" +$ echo "by: Rachael Lam" +``` + +--- + + +## `Unix` + +--- +##### **What is Unix?** +Unix was created in 1970 and since then has branched into other versions including Linux. Linux was created from Unix with very similar features, although there are some minor differences in commands. + +Unix shells - more specifically bash - is a powerful tool for quickly and easily navigating and manipulating files, scaling automated tasks, accessing Git and processing data. + +--- +##### **So what is the shell?** +The shell is any user interface/program that takes an input from the user, translates it into instructions that the operating system can understand, and conveys the output back to the user. + +There are various types of user interfaces: +- graphical user interfaces (GUI) +- touch screen interfaces +- command line interfaces (CLI) + +--- +##### **And what is bash?** +We'll be focusing on command line interfaces (CLI), more specifically `bash`, which stands for **B**ourne **A**gain **SH**ell. + +We'll also need a terminal emulator to interact with the shell. This is most likely called *terminal* on our menu. + +--- +##### **Let's get started!** +First, we'll open our terminal. As mentioned earlier, this is most likely called *terminal* and can be found by searching our computer, which on a Mac would be through `cmd + space` + +Let's take a look at the terminal. What do we notice? +- last login +- name +- location +- shell + +--- +##### **Looking at the Shell** +If we type `echo $SHELL` in our terminal, the output will tell us what shell we are working with. Most often, our shell will already be `bash` but in newer Macs, it could be `zsh` which is almost identitcal to bash. We can also see where `bash` is located by typing: +- `whereis bash` +- `whence bash` +- `which bash` + +--- +Let's start with a few commands and see what happens in our terminal. +```console +$ echo Rachael +``` +```console +$ date +``` +```console +$ cal +``` +```console +$ lksjfs +```` + +--- + + +- What happens when we type something that does not exist? +- What happens with errors? + +--- + +## **Navigate Files / Directories** + +--- + + +## `Files` + +--- +Knowing the different types of files available helps us better understand how to navigate and manipulate them. + +- Regular files are text files with readable characters. + +- Executable files are programs that are invoked as commands. + +- Shell scripts are executable files that we can read whereas bash is a non-human-readable executable file. + +--- + + +## `Directories` + +--- +Directories are files that are like folders which contain other files and directories (subdirectories), creating a hierarchical structure. + +- We can think of the structure of directories as a tree with the top of the tree being the *root*. + +- All files can be named and found in relation to the *root* by listing the directory names in order from the root, separated by slashes, followed by the file's name. + +--- +Let's try three commands that help us navigate our system: +1. First, let's run the code below and see what happens: +```console +$ pwd +``` +`pwd` prints our working directory. If we ever need to know where we are, we can execute this command. + +--- +2. Now, let's run the code below and see agian what happens: +```console +$ cd +``` +By default, `cd` changes your working directory to your home directory. You can also use `cd` to set your working directory by including the desired pathname +```console +$ cd Desktop +``` +--- +In the previous example, we were able to just state `Desktop` because it is a directory in our working directory. If we changed our working directory to `Desktop`, and then wanted to change it again to a directory in `Desktop`, we could again just specify the folder. + +If we wanted to change the working directory to a directory outside of our working directory, we would need to specify a pathname: +```console +$ cd /Users/rachaellam/Desktop +``` + +--- +3. To know what files and folderes exist in our working directory, we can use the code below: +```console +$ ls +``` +We can add a pathname at the end to list the contents of a specified directory. + +--- + + +## `Paths` + +--- +As we've seen, directory names separated by slashes are paths. There are two types of paths, *absolute* and *relative*. + +- An absolute pathname begins at the root directory and includes each directory, separated by slashes until the desired directory or file is reached. + +- A relative pathname starts from the working directory and uses symbols `.` or `..` to represent relative positions in the file tree. + +--- +Using `cd` and `pwd` let's take a look at how we can use absolute and relative pathnames. +```console +$ cd +$ pwd +``` +```console +$ cd Desktop +$ pwd +``` +```console +$ cd .. +$ pwd +``` + +--- +Here's another example using the `/usr` pathname. +```console +$ cd /usr/bin +$ pwd +``` +```console +$ cd /usr +$ pwd +``` +```console +$ cd .. +$ pwd +``` + +--- +Let's now try move through some directories to get comfortable. Try out lots of different paths depending on the file structures of your computer. Try getting into different directories from different parent directories. The tilde notation `~` in the examples below refer to our home directory. +```console +$ cd ~/Desktop +$ pwd +``` +```console +$ cd ~/Desktop/dir1 +$ pwd +``` + +--- + + +**Questions?** + +--- + + +## `Options and Arguments` + +--- +Options and arguments are used to write commands that can make changes to our system. The syntax is: +```console +$ command -option argument +``` +Options can also be combined, which we'll briefly see now but learn more about a bit later. + +--- +There are two ways to write an `-option`: +1. Short option: one dash followed by a single character +2. Long option: two dashes followed by a word + +Some examples: + +`-a` or `--all` +`-d` or `--directory` +`-r` or `--reverse` + +--- +Let's try these lines of code and see what happens: +```console +$ ls -l +``` +```console +$ ls -lt +``` +```console +$ ls -lt -reverse +``` +`-l` long format +`-t` modification time +`-reverse` reverse the sort order +Notice how `-lt` is actually a combination of multiple options. + +--- + + +**Questions?** + +--- + + +## `Wildcards` + +--- +Wildcards give us the ability to rapidly specify groups of filenames based on patterns of characters. Let's look at a few examples below: + +`*` → matches any character + +`?` → matches any single character + +`[characters]` → matches any character that is in the set + +`[!characters]` → matches any character that is not in the set + +--- +Some other helpful character wildcards are: +`[:digit:]` → matches any numeral +`[:lower]` → matches any lowercase letter +`[:upper:]` → matches any uppercase letter + +--- +Let's try a few in our terminal: +```console +$ ls * +``` +```console +$ ls a*.txt +``` +```console +$ ls [abc]* +``` +```console +$ ls [[:upper:]]* +``` +```console +$ ls [![:digit:]]* +``` + +--- + + +**Questions?** + +--- + +## **Working with** +## **Files / Directories** + +--- +We're going to learn some basic commands to begin some preliminary coding. We'll also be using these throughout the module, so it's important to understand how they work now: +- create directory `mkdir` +- create file `touch` +- copy `cp` +- move and rename `mv` +- remove `rm` + +--- + + +## `Commands` + +--- +##### **mkdir** +First let's make a directory. It's important to remember what directory you're working in currently, because that's where the new directory will be made. Let's assume for now, we're working on our desktop. +```console +$ mkdir directory +``` +We can also create multiple directories at the same time: +```console +$ mkdir dir1 dir2 dir3 +``` + +--- +##### **touch** +We can also make new files from the command line. This is particularly useful when we want to make scripts, which we'll learn a bit later. Using `touch`, we can make a new file in our working directory. +```console +$ touch file1 +``` +We can also create a specific file type by adding the extension: +```console +$ touch file1.sh +``` + +--- +##### **cp** +Now we're going to copy a file that we have on our desktop. It can be any file but remember to include the extension or if it has multiple characters, special characters and spaces, to wrap it in quotes. +```console +$ cp file1 file2 +``` +We can also copy files or directories into a directory. +```console +$ cp file1 dir1 +``` + +--- +And all files from one directory into another using wildcards: +```console +$ cp dir1/* dir2 +``` +What does the `/*` in this command mean? + +--- +There are some useful `-options` that accompany `cp`: +Option | Description +:-----|:------ +`-i` | Before overwriting an existing file, prompt the user for confirmation. If this option is not specified, `cp` will silently overwrite files. +`-r` | Recursively copy directories and their contents. This option is required when copying directories. +`-v` | Display informative messages as the copy is performed. + +--- +##### **mv** +The `mv` command enables us to move and rename files and directories, depending on how it's used. In the example below, `mv` renames file1 to file2. +```console +$ mv file1 file2 +``` +Here, `mv` moves file1 to dir1 +```console +$ mv file1 dir1 +``` + +--- +We can also move directories into other directories: +```console +$ mv dir1 dir2 +``` +In this case, if `dir2` **exists**, `dir1` will be moved to `dir2`. If `dir2` does **not exist**, it will be created and `dir1` will be moved to the newly created `dir2`. In both cases, the entire directory will be moved to another/new directory, rather than the contents. + +--- +Let's say we're in the directory `Desktop` and we just moved `file1` into `dir1` but now we want to put it back in `Desktop`. How would we move a file out of a directory into another one? Unfortunately we **can't** just say +```console +$ mv file1 Desktop +``` +because `file1` does not exist in `Desktop` any more and the command will try and rename `file1` to `Desktop`. + +--- +The answer involves using pathnames and the tilde `~` notation: +```console +$ mv dir1/file1 ~/Desktop +``` +If we just wanted to move `file1` into `dir2` (if `dir2` is in our working directory), we could type: +```console +$ mv dir1/file1 dir2 +``` + +--- +What if we want to move just the contents of `dir1` to another directory rather than the whole folder? HINT: it is very (exactly) similar to copying (`cp`). + +--- +```console +$ mv dir1/* dir2 +``` +This is a combination of the directory `dir1`, pathnames `/` and wildcards `*`. Here, `dir1/*` takes the all the contents of `dir1` and puts it in `dir2`. + +We could also use the same technique to specify certain files to move rather than all of them. How do you think this would be done? + +--- + + +**Questions** +- We're starting to combine our knowledge of files, directories and pathnames with some basic commands. How do we feel up to this point? + +--- +##### **rm** +To remove files we use the command `rm`. Because we're now deleting files, it's important that you're sure of what you're deleting because **there is no way to undo**. Fortunately!! there are ways to do this. +```console +$ rm file1 +``` +Without specifying any `-options`, `file1` will be deleted without any feedback. + +--- +To ensure we want to delete something, we can use the option `-i` (interactive) that we learned earlier. +```console +$ rm -i file1 +``` +This will prompt a question asking us if we want to delete `file1`. We can respond with `y` if yes and `n` if not. + +--- +If we want to delete a directory, we need to use the option `-r` as we did when copying (`cp`). This will recursively delete everything inside of the directory and the directory itself. +```console +$ rm -r dir1 +``` +If we're specifying multiple deletions and a directory does not exist, the shell will tell us. If we don't want that message, we can add the `-option`, `-f` (force). Force will override `-i` if it is included. + +--- +1. How do you delete multiple directories? + +2. What happens if you delete multiple directories with `-i`? + +3. What happens if you delete multiple directories with `i` but one does not exist? + +--- +Remember, it's extremely important to remember that you cannot undo `rm`. This means, if you start using wildcards to specify filenames and don't include `-i`, you could delete things by accident. For example, let's say you want to delete all `.txt` files in a directory: +```console +$ rm *.txt +``` +If you accidently add a space between `*` and `.txt`, the `rm` command will delete all the files in the directory and then try to find a `.txt` file which does not exist because it delete everything. + +--- + + +**Questions?** + +--- + + +## `Input / Output` + +--- +##### **Standard Input/Output** +Each program invokes the standard input, output and error. + +We can think of the standard input default as coming from the keyboard and if we think of everything as a file, a command such as `ls` will result in a file called *standard output* and the status message to a file called *standard error*. By default, both are linked to the screen and not saved to a disk file. + +--- +##### **Input/Output Redirection** +Input/Output redirection allows us to change where the input comes from and where the output goes to, such as storing the output of a command into a file. We can do this using the redirection operator `>`. +```console +$ ls -l /usr/bin > ls-output.txt +``` +Here we have redirected the output of `ls -l /usr/bin` to a `.txt` file called *ls-output.txt*. + +--- +We can now see the details of that file and if it worked: +```console +$ ls -l ls-output.txt +``` +By looking at the details, we can see that the file was created and it a fairly large text file, indicating that something was written to it. + +--- +If we specify a directory that does not exist, we receive the standard error: +```console +$ ls -l /bin/usr > ls-output.txt +``` +Why was the standard error not written to the `.txt` file? +What happened to our *ls-output.txt* file? + +--- +Although the standard error was not written to the `.txt` file, the destination file is always written from the beginning, therefore, the redirection began to write the file and once noticed there was an error, stopped, resulting in an empty file. + +So how do we append rather than rewrite? By using the redirection operator `>>`. +```console +$ ls -l /usr/bin >> ls-output.txt +``` + +--- +If we want to redirect the standard error, we need to use the redirection operator `2>` +```console +$ ls -l /bin/usr 2> ls-error.txt +``` +If we want to redirect both the standard output and standard error to one file, we have two options. +1. Use `2>&1` at the end of the command. +```console +$ ls -l /bin/usr > ls-output.txt 2>&1 +``` +2. Use `&>` in place of `>` +```console +$ ls -l /bin/usr &> ls-output.txt +``` + +--- + + +**Questions** + +--- +##### **cat** +`cat` takes one or more files and copies them to standard output. Using the *ls-output.txt* created earlier, we can see how that's done: +```console +$ cat ls-output.txt +``` + +--- +We can also use it to join files togther. Let's say I have two files, `file1` and `file2` and I want to combine them into a file called `file3`: +```console +$ cat file1 file2 > file3 +``` +Now the contents of file1 and file2 should be combined. + +--- +We can also use `cat` to add to a `.txt` file. +```console +$ cat > new_cat.txt +``` +Now we can type the text that we want in the file. Once we're finished, we can use `CTRL-D` to exit. + +What would be the difference between `$ cat > new_cat.txt` and `$ cat >> new_cat.txt`? + +--- +Finally, we can redirect the standard input from the keyboard to the file *new_cat.txt* +```console +$ cat < new_cat.txt +``` +This is almost identitcal to just typing `$ cat new_cat.txt` but we can see later how it could be more useful. + +--- + + +**Questions?** + +--- + +## **Pipes / Filters** + +--- +We use pipelines to read data from standard output and send to standard input using the pipe operator`|`. This means the standard output of one command can be piped into the standard input of another. + +Several commands put together in a pipeline are often referred to as filters. Filters take an input, change it and then output it. + +--- + + +## `Commands` + +--- +Let's learn a few more commands that will help us further understand pipelines and filters. We'll learn: +- extract columns from output `cut` +- sort lines of text `sort` +- report or omit repeated lines `uniq` +- print lines matching a patter `grep` +- search directories and subdirectories for files `find` +- ouput the first part of a file `head` +- output the last part of a file `tail` + +--- +##### **cut** +Let's look at a `csv` to see how we can initially see our data. Because it's a `csv`, each line is separated by a comma. Let's first read that file using `cat`: +```console +$ cat parking_data.csv +``` +We'll see a lot of text, so let's make some sense of it using cut. + +--- +To use cut, I need to pass a couple options: +1. `-d` which cuts the text based on what follows. For example, `-d:` will cut based on colons or `-d" "` will cut based on a space. +2. `-f`, which extracts a particular field based on what follows. For example, `-f1` will take the first field or `-f2` will take the second field and so on. + +--- +In this example, I'm taking the file *parking_data* and cutting it based on colons and then only extracting the first field. +```console +$ cut -d, -f1 < parking_data.csv +``` +What happens if I add another `-f` option? What does this do? +```console +$ cut -d, -f1 -f2 < parking_data.csv +``` +How would I specify more than three fields? + +--- +##### **sort** +How can we make our previous example more readable? + +One answer is to use the sort feature. We can pipe this with the cut feature: +```console +$ cut -d, -f1 < parking_data.csv | sort +``` + +--- +##### **uniq** +Additionally, I can make the above command even more readable by removing any duplicates with `uniq` +```console +$ cut -d, -f1 < parking_data.csv | sort | uniq +``` + +--- + + +**Questions?** + +--- +##### **grep** +`grep` is a powerful tool for finding patterns in text files. The syntax is: +```console +$ grep pattern [file...] +``` +In our case, we're going to use it with our previous example and pipe it with other commands: +```console +$ cut -d, -f1 parking_data.csv | sort | uniq | grep FIRE +``` +The results are all patterns of FIRE in the text file. + +--- +##### **find** +Another useful use for `grep` is to find files in directories. `grep` is nicely combined with `find` for this feature. + +```console +$ find ~/Desktop/dir1 | grep cat +``` +Here we're searching in the directory *dir1* with the pattern *cat*. This would be helpful if we wanted to know if there were any files with the word cat in the filename. + +--- +##### **head / tail** +We can also extract the first and last part of files using `head` and `tail`. We can also add the option `-n` followed by a number to extract a certain number of lines. +```console +$ head -n 5 ls-output.txt +``` +```console +$ tail -n 5 ls-output.txt +``` + +--- +`head` and `tail` can also be used in pipelines: +```console +$ cut -d, -f1 < parking_data.csv | sort | uniq | head -n 5 +``` +```console +$ cut -d, -f1 < parking_data.csv | sort | uniq | tail -n 5 +``` + +--- + + +**Questions?** + +--- + + +## `Expansions` + +--- +Expansion uses special characters to expand upon something before the shell processes it. We have learned a few expansions so far such as the tilde `~` and wildcards `*`. We've also seen some character wildcards `[characters]`. + +Expansions are another feature that help us when we're manipulating and working with files and directories. + +Other examples of expansions are: +- arithmetic expansion +- brace expansion + +--- +##### **Arithmetic Expansion** +Arithmetic expansion basically makes the shell a calculator. +The syntax is: + +`$((expression))` + +For example: +```console +$ echo $((2 + 2)) +``` + +Arithmetic expressions can nested: +```console +$ echo $(($((2 + 2)) * 3)) +``` + +--- +Just for reference, here is a list of the arithmetic operators: +Operator | Description +:-----|:------ +`+` | Addition +`-` | Subtration +`*` | Multiplication +`/` | Integer division +`**` | Exponentiation + +--- +##### **Brace Expansion** +Brace expansions allow us to create multiple text strings from a pattern containing braces. Here are a few examples: +```console +$ echo Test-{A,B,C}-Example +``` +```console +$ echo Number_{1..5} +``` +```console +$ echo {Z..A} +``` +Brace expansions can also be nested: +```console +$ echo a{A{1,2},B{3,4}}b +``` + +--- +We can use brace expansion to help make multiple directories using `mkdir`. +```console +$ mkdir dir-{1..3} +``` +This command makes 3 directories named *dir-1*, *dir-2* and *dir-3* + +--- + + +## `Quoting / Backslashing` + +--- + +Quoting suppresses unwanted expansions. We can use double quotes, single quotes or backslashes: + +- Double quotes force special characters to lose their meaning and are treated as ordinary characters except for + `*` `\` `'` +- Single quotes suppress all expansion +- Backslashes are used to escape single characters + +--- +Many times there will be file names or directories that are named with spaces. In this case, we'll need to use double quotes so that the shell can read it. + +Using `touch` we can create a text file named something separated with two words: +```console +$ touch "two words.txt" +``` +We can then see the details of the file we just created: +```console +$ ls -l "two words.txt" +``` + +--- +If we want to rename the text, we would do as follows: +```console +$ mv "two words.txt" two_words.txt +``` + +--- +Let's see what these three examples do in shell: +```console +$ echo '2 * 3 > 5 is an equation' +``` +```console +$ echo '2 * 3 > 5' is an equation +``` +```console +$ echo 2 \* 3 \> 5 is an equation +``` + +--- + + +**Questions?** + +--- + + +## `Command Line Editing` + +--- +Getting familiar with command line editing can save you time. Bash uses a library called Redline to use command line editing + +There are many shortcuts and you don’t have to memorize them all, just use the ones that you feel are best. There are even more shortcuts that you can read about in the textbooks! + +--- +##### **Character Commands** +Command | Description +:-----|:------ +CTRL-B | Move one character backwards +CTRL-F | Move one character forwards +DEL | Delete one character backwards +CTRL-D | Delete one character at cursor location + +--- +##### **Word Commands** +Command | Description +:-----|:------ +ESC-B | Move one word backwards +ESC-F | Move one word forwards +ESC-DEL | Delete one word backwards +ESC-D | Delete one word forwards +CTRL-Y | Undo + +--- +##### **Line Commands** +Command | Description +:-----|:------ +CTRL-A | Move to beginning of the line +CTRL-E | Move to end of the line +CTRL-K | Delete text from the cursor to end of line +CTRL-U | Delete text from the cursor to the beginning of the line + +--- +##### **History Line Commands** +Command | Description +:-----|:------ +CTRL-P | Move to the previous line in your history of commands +CTRL-N | Move to the next line in your history commands +`!!` | Repeat the last command + +--- +Command | Description +:-----|:------ +``!number`` | Repeat history list item number +`!string` | Repeat last history item starting with string +`!?string` | Repeat last history item containing string + +--- + + +**Questions?** + +--- + + +## `Completion Command` + +--- +Completion commands autocomplete your command if it exists by hitting `tab`. If it does not exist, the command will not be able to complete. + +If multiple exist, the command will also not be able to complete because it will not know which one to choose. + +For example, let's say we have two files called `file1` and `file2`. If would not be able to use autocomplete because the shell will not know which to choose until the last character. + +--- +If we have two files, one called `foot.txt` and one called `file.tx`. This command would not be able to autocomplete: +```console +$ ls f +``` +But this one will: +```console +$ ls fil +``` + +--- + + +**Questions?** + +--- + +## **Shell Scripts** + +--- +##### **Shell Scripts** +Shell scripts allow us to combine several commands into one file, rather than one by one on the command line. + +The shell will read the script just as if you were to write the command on the command line. + +Most things that can be done in the shell script can be done on the command line and vice versa. + +--- +##### **Writing Shell Scripts** +There are three important considerations when writing the shell script +1. **Write a script:** scripts are ordinary text files. You can use a text editor that will provide syntax highlighting (color coding elements of the script). It can help find errors but writing in TextEdit is possible. +2. **Make a shell script executable:** set the script permissions to allow it to be executed +3. **Put the shell script somewhere the shell can find it:** the shell script automatically searches certain directories for executable files when no explicit pathname is specified. + +--- +##### **Set Up** +Open either TextEdit or your text editor of choice. Some popular programs are Sublime Text, Vim, Atom and Notepad++. + +If you want to see the syntax highlighting, you might have to save your script as a `.sh` file. Without doing this, your file will just look like a regular `.txt` file. + +Once you open your text editor and save it, we can begin our first script! + +--- +##### **Script File Format** +We must first tell the shell the name of the interpreter that should be used to execute the script. This is marked by using a shebang: `#!` + +Throughout the script, you can and **should** use `#` to make comments. Comments make your code more readable and can help you understand your code when you come back to it. + +--- +```bash +#!/bin/bash + +# this is our first comment + +echo "This is our first script!" +``` +Here we can see we've told the shell to use`/bin/bash` using the shebang `#!` +We've also added a comment using `#` +And finally, something quite familiar, we have our first line of script using `echo` + +--- +##### **A Note on Commenting** +Commenting is important not just so you can understand your own work, but also so other can understand your work in collaborative projects. It also helps make your code reproducible. + +Comments can be inline: +```bash +echo "Hello World" #this is an inline comment +``` +or as comment blocks: +```bash +#this is a comment block +echo "Hello World" +``` + +--- + + +**Questions?** + +--- +##### **Executable File Permission** +In order to execute our file, we have to add file permissions: + +`chmod` helps make our script executable +`775` is used to make scripts that everyone can execute +`700` is used to make scripts that only the owner can execute + +--- +Here, `chmod` is combined with `775` so that everyone can execute the script: +```console +$ ls -l first_script.sh +``` +```console +$ chmod 775 first_script.sh +``` + +--- +##### **Script File Location** +In order to run our script, we have to call it using `./` in front of the script filename (`./script`). + +File location is important to run your script. If just `script` was written, the shell would not be able to find the script and try read it as a command, ouputting `command not found`. + +Running `echo $PATH` helps us see what directories are being searched for the script. + +--- +If we want to run our script without `./`, we can create a `/bin` for our script, move our script into the bin folder and then run it. It's important to note that we have to make this bin in our home directory. If we made it on our Desktop, the script would still not be found. +```console +$ mkdir bin +$ mv first_script.sh bin +$ first_script.sh +``` +In this block of code, we're making the bin folder using `mkdir`, moving the script into the bin with `mv` and then running the script without `./`. + +--- +##### **Good Locations for Scripts** +For personal use, a good place to put your script is `/bin`. + +For everyone's access, it's better to put scripts in `/usr/local/bin`. + +--- + + +**Questions?** + +--- + +# **Shell Functions** + +--- +##### **Functions** +Functions are a good way to break down code into smaller, more manageable chunks. Each chunck can represent a task. + +For example, let's say your entire process is make pasta. It can be broken down into: +1. Prepare vegetables +2. Make sauce +3. Cook pasta +4. Serve + +--- +Each of these steps can be expanded further into sub processes. Cook pasta can be: +1. Fill pot with water +2. Boil water +3. Measure pasta +4. Add pasta to boiling water +5. Cook for 8-12 minutes +6. Strain + +--- +Functions have two syntactic forms: +```bash +function name { + commands + return +} +``` +```bash +name () { + commands + return +} +``` +`name` is the name of the function +`commands` are the commands contained in the function + +--- +Let's write our first function: +```bash +#!/bin/bash + +function funct { + echo "Step 2" + return +} + +#program starts here + +echo "Step 1" +funct +echo "Step 3" +``` +What do you think this function will output? + +--- +Let's save and run this function in our terminal to see what happens. + +Here's a good time to recap how to save, grant permissions and run the script. +`chmod` - permissions command +`775` - grant permissions to everyone +`700` - grant permissions to yourself +`/bin` - where to save permissions + +--- + + +**Questions?** + +--- + + +## `Variables` + +--- +##### **Global Variables** +Let's make our script more complex with some variables. We can first define variables directly through the terminal. +```bash +$ foo="something cool" +$ echo $foo +``` +Notice how in order to call the variable we need to add `$` before the variable. The quotes are not necessary if the value of the variable doesn't include spaces when defining it. If we did not include the quotes here, we would receive an error. + +--- +Now let's add some global variables to our script: +```bash +#!/bin/bash + +step="Step 2" + +function funct { + echo $step + return +} + +#program starts here + +echo "Step 1" +funct +echo "Step 3" +``` +What do we think will be the output in this example? + +--- +##### **Local Variables** +Local variables are variables that are contained within the function. Because they're contained, they can have names that already exist in the shell globally or within other shell functions. + +--- +```bash +#!/bin/bash + +foo=0 # global variable foo +funct_1 () { + local foo # variable foo local to funct_1 + foo=1 + echo "funct_1: foo = $foo" +} + +funct_2 () { + local foo # variable foo local to funct_2 + foo=2 + echo "funct_2: foo = $foo" +} + +echo "global: foo = $foo" +funct_1 +echo "global: foo = $foo" +funct_2 +echo "global: foo = $foo" +``` + +--- +What would happen if we removed `local`? +```bash +#!/bin/bash + +foo=0 # global variable foo +funct_1 () { + foo=1 + echo "funct_1: foo = $foo" +} + +funct_2 () { + foo=2 + echo "funct_2: foo = $foo" +} + +echo "global: foo = $foo" +funct_1 +echo "global: foo = $foo" +funct_2 +echo "global: foo = $foo" +``` + +--- + + +**Questions?** + +--- + + +## `Parameters` + +--- +##### **Positional Parameters** +Positional parameteres are built in parameters that allow our programs to get access to the contents of the command line. This is extremely valuable when we are creating scripts and then want to pass a parameter through the script from the command line. + +If our code has more than 9 positional parameters, you need to enclose the positional parameter in curly brackets `${10}` + +Let's create a script to see how this works: + +--- +```bash +#!/bin/bash + +echo " +Number of arguments: $# +\$0 = $0 +\$1 = $1 +\$2 = $2 +\$3 = $3 +\$4 = $4 +\$5 = $5 +\$6 = $6 +\$7 = $7 +\$8 = $8 +\$9 = $9 +" +``` + +--- +In the example, you may notice that we haven't given `$0` any specific value. +Let's try run the script a couple ways through the command line to see what this means: +1. Run the script with arguments `a b c d`. +2. Run the script with any arguments of your choice. + +What do we notice? + +--- +##### **$\* and $@** +`$*` → Expands into the list of positional parameters, starting with 1. When surrounded by double quotes, it expands into a double quoted string containing all of the positional parameters, each separated by the first character of the IFS shell variable (by default a space character). +`$@` → Expands into the list of positional parameters, starting with 1. When surrounded by double quotes, it expands each positional parameter into a separate word surrounded by double quotes. + +--- +Let's take a look at this code piece by piece: +```bash +print_params () { + echo "\$1 = $1" + echo "\$2 = $2" + echo "\$3 = $3" + echo "\$4 = $4" +} + +pass_params () { + echo -e "\n" '$* :'; print_params $* + echo -e "\n" '"$*" :'; print_params "$*" + echo -e "\n" '$@ :'; print_params $@ + echo -e "\n" '"$@" :'; print_params "$@" +} + +pass_params "word" "words with spaces" +``` + +--- +1. Here we have two functions: `print_params ()` and `pass_params ()`. `pass_params ()` calls on the function `print_params ()` within its function. +2. In the first function, `echo` is printing the line inside the double quotes. The `\` in front of `$1` escapes the `$`, thus losing its meaning, as we learned earlier. +```bash +print_params () { + echo "\$1 = $1" + echo "\$2 = $2" + echo "\$3 = $3" + echo "\$4 = $4" +} +``` + +--- +3. In the second function, `echo` again is printing the line inside the single quotes. `"\n"` is adding a tab at the beginning of the line for readability. It is then calling on the first function (`print_params ()`) with the argument `$*`. The second echo is calling the first function but with the argument `$*` in double quotes. This is repeated for `$@` +```bash +pass_params () { + echo -e "\n" '$* :'; print_params $* + echo -e "\n" '"$*" :'; print_params "$*" + echo -e "\n" '$@ :'; print_params $@ + echo -e "\n" '"$@" :'; print_params "$@" +} +``` + +--- +4. In the final part of the code, we're calling on the `pass_params ()` function and passing two arguments: `"word"` and `"words with spaces"`. +```bash +pass_params "word" "words with spaces" +``` + +--- +Let's see what happens's when we run the script in terminal. Remember, we don't have to pass any arguments in the command line because we have done so in our script. + +--- + + +**Questions?** + +--- +Let's take a look at another example. In this example we'll get a greater understanding of variables and positional parameteres: +```bash +function afunc { + echo in function: $0 $1 $2 + var1="in function" + echo var1: $var1 +} + +var1="outside function" + +echo var1: $var1 +echo $0: $1 $2 +afunc funcarg1 funcarg2 +echo var1: $var1 +echo $0: $1 $2 +``` + +--- +Let's break it down again: +1. In our first function called `afunc`, using `echo` we will print `in function:` and pass 3 positional parameters. We will then define the variable `var1` and call it `"in function"` and print it using `echo` again. +```bash +function afunc { + echo in function: $0 $1 $2 + var1="in function" + echo var1: $var1 +} +``` +2. Outside of the function, we'll create another variable also named `var1` and give it the value of `"outside function"` +```bash +var1="outside function" +``` + +--- +3. We'll then add the program. +a) `echo`, we'll print `var1` +b) Print 3 positional paramaeters +c) Call the function with two arguments +d) Print `var1` again +e) Print 3 positional parameters again +```bash +echo var1: $var1 +echo $0: $1 $2 +afunc funcarg1 funcarg2 +echo var1: $var1 +echo $0: $1 $2 +``` + +--- +Let's run it in our terminal without any additional arguments and see what the output is. +- Why did `echo $0: $1 $2` only output one argument? +- Why did `var1` change the third time to `inside function` rather than `outside function`? + +--- +Now let's change and add a few things to see what happens: +- In our terminal, what happens if we pass two arguments by entering `ascript.sh arg1 arg2` with `ascript.sh` being the name of our script and `arg1 arg2` being two random arguments? +- What happens if we add `local` to our function? + +--- + + +**Questions?** + +--- +##### **Parameter Expansion** +Let's discuss the difference between `$a` and `${a}` + +`$a` on it's own is fine, but when placed next to another string, it can confuse the shell. For example: + +- `$a_file` the shell will try to expand a variable named `a_file` rather than `a` + +- `${a}_file` the shell will now try to expand the variable `a` + +This can help us be more flexible when navigating and manipulating files and directories. + +--- +Let's look at the code below to see how this helps us: +```console +$ filename="myfile" +$ touch $filename +$ mv $filename ${filename}1 +``` +This block of code creates a file based on our defined variable and then renames it with the same variable but with an additional component. + +--- +Parameter expansion also help us if our variables are unset (ie. do not exist) or are empty. Let's take a look at a couple examples in the next few slides. + +--- +1. `${parameter:-x}` If parameter is unset or empty, expansion results in the value of *x*. If it's not empty, it results in the value of the parameter +```console +$ foo= +$ echo ${foo:-"something else"} +$ echo $foo +$ foo=bar +$ echo ${foo:-"something else"} +$ echo $foo +``` +Through this sequence of commands we can see that when `$foo` is empty, `:-` fills the variable with `"something else"`. Once we define the variable, `:-` results in our defined variable. + +--- +2. `${parameter:=x}` If parameter is unset or empty, expansion results in the value of *x* and the value of *x* is assigned to the parameter. If it's not empty, it results in the value of the parameter +```console +$ foo= +$ echo ${foo:="something else"} +$ echo $foo +$ foo=bar +$ echo ${foo:="something else"} +$ echo $foo +``` +We can see that when `$foo` is empty, `:=` assigns the variable with `"something else"`. If we define the variable again, `:-` results in our second defined variable. + +--- +3. `${parameter:?x}` If parameter is unset or empty, this expansion causes the script to exit with an error, and the contents of *x* are sent to standard error. If parameter is not empty, the expansion results in the value of parameter. +```console +$ foo= +$ echo ${foo:?"something else"} +$ echo $? +$ foo=bar +$ echo ${foo:?"something else"} +$ echo $? +``` +We can see that when `$foo` is empty, `:?` gives us an error which we can see as `echo $` outputs `1`. If we define the variable again, `:?` results in the value of our variable. + +--- +4. `${parameter:+x}` If parameter is unset or empty, the expansion results in nothing. If parameter is not empty, the value of *x* is substituted for parameter; however, the value of parameter is not changed. +```console +$ foo= +$ echo ${foo:+"something else"} +$ echo $foo +$ foo=bar +$ echo ${foo:+"something else"} +$ echo $foo +``` +Here, `:+` resulted in an empty output and the value of `$foo` remains empty. If we define the variable, `:+` will still ouput what we defined, but it will not reassign the variable perminently. + +--- +##### **String Operators** +String operators are extemely valuable for operations on pathnames. They can help extract parts of pathnames, especially if they follow a pattern. Many pathnames typically follow patters, such as all extensions are preceeded with `.`. + +Some character expansions are: +1. `${#parameter}` +2. `${parameter:offset}` +3. `${parameter:offset:length}` + +--- +1. `${#parameter}` expands into the length of the string contained by the parameter. +```console +$ foo="Toronto needs more trees" +$ echo "'$foo' is ${#foo} characters long." +``` + +--- +With the following expansions, we can extract a portion the string contained by the parameter. + +2. `${parameter:offset}` will extract characters from *offset* characters to the end of the string. For example, counting from the beginning of the string, the *n* of *needs* is 8 characters from the beginning. Because did not specify an end, `echo` will print from *needs* onwards. +```console +$ foo="Toronto needs more trees" +$ echo ${foo:8} +``` + +--- +3. `${parameter:offset:length}` will specify the length that we want to extract. This length is counted not from the beginning of the string, but from the offset of the string. +```console +$ foo="Toronto needs more trees" +$ echo ${foo:8:5} +``` +We can see that from the beginning of the string, *n* is 8 characters in, and from *n*, *s* of *needs* is the 5th character from *n*. Therefore, our ouput will be *needs*. + +--- + + +**Questions?** + +--- +Let's now see how to use patterns in our parameter expansions. There are several ways we can achieve this: + +1. `${parameter#pattern}` +2. `${parameter##pattern}` +3. `${parameter%pattern}` +4. `${parameter%%pattern}` + +--- +1. `${parameter#pattern}` removes the shortest leading portion of the string contained in *parameter* defined by the *pattern*. +```console +$ foo=/User/name/Desktop/file.txt.zip +$ echo ${foo#/*/} +``` +In this example, we've defined foo as a file with an extension. The expansion matches any (`*`) pattern of `/*/` and returns the shortest leading portion. + +--- +2. `${parameter##pattern}` is very similar to the previous expansion except it removes the longest leading portion of the string. +```console +$ foo=/User/name/Desktop/file.txt.zip +$ echo ${foo##/*/} +``` +Very similar to the previous example, the expansion matches any (`*`) pattern of `/*/` and returns the longest leading portion. + +--- +3. `${parameter%pattern}` removes the shortest ending portion of the string rather than the beginning. +```console +$ foo=/User/name/Desktop/file.txt.zip +$ echo ${foo%.*} +``` + +4. `${parameter%pattern}` removes the longest ending portion of the string. +```console +$ foo=/User/name/Desktop/file.txt.zip +$ echo ${foo%%.*} +``` + +--- +What happens if we change our pattern to `#*_`? + +Let's pretend a file named "rachaels_file" and we want to know its extension. How would we do that? + +What if our file was name "rachaels file" + +--- +We can also use expansions to replace the contents of the parameter with a string based on the pattern. + +1. `${parameter/pattern/string}` replaces only the first occurence of pattern. +2. `${parameter//pattern/string}` replaces all occurances. +3. `${parameter/#pattern/string}` requires the match to occur at the beginning of the string to replace it. +4. `${parameter/%pattern/string}` requires the match to occur at the end of the string to replace it. + +--- +Let's see how this would work: +```console +$ foo="MP3.MP3" +``` +```console +$ echo ${foo/MP3/mp3} +``` +```console +$ echo ${foo//MP3/mp3} +``` +```console +$ echo ${foo/#MP3/mp3} +``` +```console +$ echo ${foo/%MP3/mp3} +``` + +--- +Can you think of when this might be helpful? + +Let's say I have a a named "rachaels cool file". I want to rename them because spaces cause problems in filenames. How would I do this? + +--- + + +**Questions?** + +--- +##### **Arithmetic Assignment** +We have seen assignment before with examples such as `foo=5`. This is a simple assignment but we can also add complexity to this assignment with other operators. +- `$((parameter += x))` assigns the parameter to itself `+` x +- `$((parameter -= x))` assigns the parameter to itself `-` x +- `$((parameter *= x))` assigns the parameter to itself `*` x +- `$((parameter /= x))` assigns the parameter to itself `/` x + +--- +We can also increase or decrease our parameters by one. +- `$((parameter++))` increases parameter by one after the parameter is retruned +- `$((parameter--))` decreases the parameter by one after the parameter is returned +- `$((++parameter))` increases parameter by one before the parameter is returned +- `$((--parameter))` decreases parameter by one before the parameter is returned. + +--- +These are very subtle changes so let's see what we mean after and before a parameter is returned: +```console +$ foo=1 +$ echo $((foo++)) +$ echo $foo +``` +```console +$ foo=1 +$ echo $((++foo)) +$ echo $foo +``` + +--- + + +**Questions** + +--- +##### **Command Substitution** +So far we've learned how to get values into variables by using assignment statements (`x=5`) and positional parameters (`x=$1`). Another way is command substitution which allows you to use the standard output of the command as if it were a variable. + +--- +Let's say we want to assign a variable to the output of a command so that we can apply another command to that output. In this particular case, we want to make a variable equal all files beginning with *t*. We then want to apply a sort command on that variable: + +```console +$ x=$(find t*) +$ echo $x | sort +``` + +Although this seems quite simple now, we'll see how this can be extremely powerful when we move into flow control. + +--- + +# **Flow Control** + +--- +Flow control allows programs to "change directions" based on the results from a given input. + +Bash supports several constructs: + +- `if/else` +- `while` / `until` +- `case` +- `for` + +--- + + +## `if / else` + +--- +`if/else` is a conditional statement that chooses whether or not to do something based on a true or false statement. + +```bash +if condition; then + commands + +[elseif condition; then + commands...] + +[else + commands] + +fi +``` + +--- +Here, we've assigned `x` to the value `5`. We've then written an `if/else` statement that asks if `x` is equal to `5` than tell us that `x` equals `5`. Otherwise (`else`), tell us that `x` does not equal `5` +```bash +x=5 + +if [ $x = 5 ]; then + echo "x equals 5." + +else + echo "x does not equal 5." + +fi +``` + +--- +Let's take a look at a more practical example: we want to know if there are any files in our directory that contain spaces. + +```bash +#!/bin/bash + +cd ~/Desktop/dir1 + +if [[ -n $(find t* | grep " ") ]]; then + echo "A file contains a space" +else + echo "No files contain a space" +fi +``` + +--- +First we've changed our working directory to *dir1*: +```bash +cd ~/Desktop/dir1 +``` +We then utilized command substitutions that we've just learned by storing the output of files that contain a space. The `-n` option checks if the length of of a string is **nonzero**: +```bash +-n $(find t* | grep " ") +``` + +--- +By wrapping our output in an if statement, we're stating: +1. `if` the value of `$(find t* | grep " ")` is nonzero, then print (`echo`) `"A file contains a space"` +2. Otherwise (`else`), print (`echo`) `"No files contain a space"` + +--- + + +**Questions** + +--- +##### **Control Operators** +Control operators (`&&` and `||`) allow you to test more than one thing at a time. Their syntax is: +```bash +if command1 && command2; then + ... +fi +``` +```bash +if command1 || command2; then + ... +fi +``` + +--- +With the `&&` operator, command1 is executed and command2 is executed only if command1 is **successful** + +With the `||` operator, command1 is executed and command2 is executed only if command1 is **unsuccessful** + +--- +Example of `&&` +```bash +filename=$1 +word1=$2 +word2=$3 + +if grep $word1 $filename && grep $word2 $filename; then + echo "$word1 and $word2 are both in $filename." +fi +``` + +--- +Using positional parameters that we learned earlier, what do you think will happen if we run the previous code? + +- What happens if both words exist? +- What happens if only one word exists? +- What happens if no words exist? + +--- +Example of `||` +```bash +filename=$1 +word1=$2 +word2=$3 + +if grep $word1 $filename || grep $word2 $filename; then + echo "$word1 or $word2 is in $filename." +fi +``` + +--- +Similarly, what will happen if... +- What happens if both words exist? +- What happens if only one word exists? +- What happens if no words exist? + +--- + + +**Questions?** + +--- + + +## `While` + +--- +Using the while command, let's discuss looping. Looping allows portions of a program to repeat as long as the condition is false. This syntax is: + +```bash +while condition; do + commands +done +``` + +--- +Let's make a basic while script that displays five numbers in sequential order from 1 to 5 and then tells us when it's finished. +```bash +#!/bin/bash + +# script called while-count.sh + +count=1 + +while [ $count -le 5 ]; do + echo $count + count=$((count +1)) +done +echo "Finished." +``` +Why does the loop end? + +--- +While loops are extremely helpful to read lines of a file and then perform some command if a line meets a certain condition. Let's explore how to read lines first: + +```bash +file=file1 + +while read -r line; do + echo $line +done < "$file" +``` +In this script, we're creating a variable with our file. We're then reading the file until the last line is read. In this example, we're using an input redirection that we learned earlier (`<`), which passes the file into the read command. We've also used `-r` so that any backslashes are escaped. + +--- +Because line is acting as variable, we can also nest another loop if `$file` meets a condition. Let's say we have a file and we want to know every line that has *bananas* in it. + +How would we combine the while loop with an if statement? + +--- +```bash +while read -r line; do + if [[ $line == *"bananas"* ]]; then + echo $line + fi +done < "$file" +``` + +Here we're reading the file line by line using the `while` loop. We're then saying `if` our variable, `$line` equals `"banana"`, then print the `$line`. + +1. Why have we added the wildcard `*`? +2. What would happen if we didn't include `*`? + +--- + + + +**Questions?** + +--- + + +## `Until` + +--- +Until loops are similar to while, except unlike while loops that run as long as the condition is true, the until loop will run as long as the condition is **false** + +```bash +until condition; do + commands +done +``` + +--- +Let's create a script similar to the while statement: a basic while script that displays five numbers in sequential order from 1 to 5 and then tells us when it's finished. +```bash +count=1 + +until [ $count -gt 5 ]; do + echo $count + count=$((count +1)) +done +echo "Finished." +``` +How is this script different to the while loop? + +--- +How might this be useful? Let's say we want to create 3 directories labeled *dir1*, *dir2* and *dir3*: + +```bash +x=1 +until [[ $x == 4 ]]; do + echo "Creating dir$x..." + mkdir dir$x + ((x++)) +done +``` +Here we've created a variable `x=1` because we want our first directory to be *dir1*. We're then saying up `until` `x=4`, make a directory `mkdir` called *dir* plus our variable. We've then added 1 to `x` each iteration using an arithmetic assignment. The `echo` part is just to give us some feedback on what is happening behind the scenes. + +--- + + +**Questions?** + +--- + + +## `for` + +--- +For our final flow control, we're going to learn a powerful loop called `for`. The syntax is: + +```bash +for variable [in words]; do + commands +done +``` +What we might notice is that this flow uses variables that will increment during the execution of the loop. + +--- +How would we use `for` if we wanted to list all files and directories in a folder? + +```bash +for i in $(find *); do + echo $i +done +``` + +The variable `i` becomes all instances of the variable +`$(find *)`. For each instance of `i`, we are then printing it. + +Although this seems quite basic and there more simple ways to list all files and directories (`ls`), this enables us to do many things with the looped variable `i` by nesting other loops. + +--- +What other ways can we use for loops? +What other ways can we use for loops within files? + +--- + + +**Questions?** +- Why do we use `i`? + +--- +![bg](pics/minions.gif) + +--- +##### **Next Week: Git and Github** +- Please make sure to come with a GitHub account + +--- + + +## `Additional Material` + +--- +##### **Exit Status** +Commands issue a value to the system when they terminate, which is an integer in the range of 0 and 255 indicating the success or failure of a command's execution. + +Conventionally, zero indicates success and any other value indicates failure. + +--- +Let's list a file that we know exists on our desktop: +```console +$ ls -d /usr/bin +$ echo $? +``` +`-d` is an option that returns the file if it exists and is a directory. +`$?` returns the value of the last executed command. The value being either zero for succes or any other number for failure. + +--- +If we then list a file that we know does not exist in our desktop and return the value of `$?`, what do we expect to happen? +```console +$ ls -d /bin/usr +$ echo $? +``` + +--- +##### **Exit Command** +The `exit` command in a script replaces the return command and accepts a single, optional argument, which becomes the scripts exit status. + +When no argument is passed, it defaults to zero. + +This enables our scripts to indicate an error. + +If the script is a function in a larger program, we can use `return` instead of `exit` with a single, optional argument, allowing our function to indicate an error. + +--- +```bash +#!/bin/bash + +# test-file: Evaluate the status of a file + +FILE=~/.bashrc + +if [ -e "$FILE" ]; then + if [ -f "$FILE" ]; then + echo "$FILE is a regular file." + fi + if [ -d "$FILE" ]; then + echo "$FILE is a directory." + fi +else + echo "$FILE does not exist" + exit 1 +fi + +exit +``` + +--- +```bash +test_file () { + # test-file: Evaluate the status of a file + + FILE=~/.bashrc + + if [ -e "$FILE" ]; then + if [ -f "$FILE" ]; then + echo "$FILE is a regular file." + fi + if [ -d "$FILE" ]; then + echo "$FILE is a directory." + fi + else + echo "$FILE does not exist" + return 1 + fi +} +``` + +--- +`if / else` statements are most frequently used with `test` + +`test` performs a variety of checks and comparisons + +Its syntax is: + +`test expression` + +or + +`[ expression ]` + +--- +There are many expressions that are used to evaluate the status of files. Some important **File Expressions** include: + +Expression | Is True If: +:-----|:------ +`-e file` | file exists +`-d file` | file exists and is a directory +`-f file` | file exists and is a regular file +`-r file` | file exists and is readable (has readable permissions for the effective user) +`s file` | file exists and has a length greater than zero + +--- +**String Expressions** + +Expression | Is True If: +:-----|:------ +`string` | string is not null +`-n string` | the length of string is > than zero +`-z string` | the length of string is zero +`string1 == string2` | string1 equals string2 +`string1 != string2` | string1 and string2 are not equal + +--- +**Integer Expressions** + +Expression | Is True If: +:-----|:------ +`integer1 -eq integer2` | integer1 is == to integer2 +`integer1 -ne integer2` | integer1 is != equal to integer2 +`integer1 -le integer2` | integer1 is <= to integer2 +`integer1 -lt integer2` | integer1 is < to integer2 +`integer1 -ge integer2` | integer1 is >= to integer2 +`integer1 -gt integer2` | integer1 is > to integer2 + +--- +##### **Breaking Out Of A Loop** +Bash has two build-in commands that can be used to control program flow inside loops. +- `break` command immediately terminates a loop and resumes with the next statement following the loop +- `continue` command skips the remainder the loop that is not needed (ie. a condition has been met) and resumes with the next iteration of the loop. `continue` allows for a more efficient execution + +--- +```bash +if condition; then + if condition; then + commands + continue + fi + if condition; then + commands + continue + fi +else condition; then + command +fi +``` +If the first `if` condition is met, then the second one will be skipped and resumed with the next iteration. + +--- +```bash +if condition; then + if condition; then + commands + continue + fi + if condition; then + commands + break + fi +else condition; then + command +fi +``` +If the second `if` condition is met, then the break immediately terminates the loop and resumes with the next statement. + + + + + + + + + + + + + + + + + + + + + + + + + + + +