Lecture notes for "Programming for Data Science", "Python for Data Science" and "Python for ML Engineering."
This repository contains lecture notes for classes offered by Shahbaz Chaudhary at the University Of Chicago's Masters in Applied Data Science program.
Please follow the instructions below to get your computer ready for this class.
Note Mac users: Once software is downloaded, if you double click to launch it, you may get permission errors. Try to right click on the downloaded software, pick "open" and continue. (Apple is trying to protect you from accidentally starting malware/virus)
Please install Python from this website: https://www.anaconda.com/download/ (modern computers are 64 bit so please pick that option)
Mac users: Accept all default prompts
Windows users: Accept all default prompts
Anaconda's distribution of Python is widely used in the industry, particularly among data scientists. This distribution makes it easy to use many libraries and packages for data analysis, building models, visualization, etc.
Once installed, please start jupyter notebook and execute code provided below
- Start
Anaconda Navigator
and clickLaunch
on the panel labeledJupyter Notebook
- Create new notebook from the web interface
- Execute this code:
%%timeit
sum(range(1_000_000))
- Execute this code:
from psutil import virtual_memory, disk_usage, cpu_count, os
bytes_in_gb = 1024**3
print("Memory:\t",round(virtual_memory().total/bytes_in_gb,4), "Gigabytes")
print("Disk:\t",round(disk_usage(os.path.abspath(os.sep)).total/bytes_in_gb,4), "Gigabytes")
print("CPUs:\t", cpu_count())
- Visit this web page: https://github.com/falconair/ProgrammingForAnalytics
- Click "Clone or download" and pick the "Download ZIP" option (unless you already have a GitHub account)
Please intall Git, a version control sotware, from this website: https://git-scm.com/downloads (you are ok to use default settings)
Note that this is a command-line tool. Once installed, you may not see a new icon to click. We will install a Desktop client to remedy this.
Although we don't make heavy use of version control, you will be introduced to the concept. Installing Git also installs "Git Bash," and comand line environment which simulates Unix/Linux. We will do several exercises which will require this environment.
- Install a Graphical interface to Git from this website: https://desktop.github.com/
- [Windows users only]
a. type
cd
(this will take you to your home directory) b. typeecho cd >> .profile
(this will make sure your home directory is loaded when you start Git Bash)
Please install Visual Studio Code from https://code.visualstudio.com/
Install Python extensions from https://marketplace.visualstudio.com/items?itemName=ms-python.python (visit that page and click "Install")
========
Module | Class | Description | |
---|---|---|---|
Intro to consoles | Intro to consoles | This lectures introduces the concept of a console, such as dos cmd or mac terminal, to students | |
Programming vs calculators | Programming vs calculator | Helps novices understand what features need to be added to a calculator to make it a fully programming environment | |
First programs | First programs | Several examples of small, but full programs which use all common programming constructs and data structures | |
Intro to Jupyter | Intro to Jupyter - not technical | Provides hsitorical context for Jupyter | |
Intro type Jupyter - technical | Provides a practioner specific intro to Jupyter | ||
All of Python | All of Python - faster basics | An overview of Python for computer programmers (multi-week lecture) | |
All of Python - basics | An overview of Python for novice or non-programmers: teaches programming constructs | ||
All of Python - variables and tuples | Teaches multiple variable assignment | ||
All of Python - basic functions | Introduces functions | ||
All of Python - numbers | Overview of numbers and related operations | ||
All of Python - strings | Overview of strings and related operations | ||
All of Python - Boolean algebra | Dives deeper into the world of comparisons, and/or/not | ||
All of Python - basic plotting | General matlab intro (not recommended for novices) | ||
All of Python - dictionaries | Introduces Python dictionaries (aka maps, associative arrays) | ||
All of Python - lists | Teaches lists | ||
All of Python - comprhensions | Teaches list and dictionary comprehensions (useful but intermediate feature) | ||
All of Python - basic classes | Introduces classes and the very basics of object oriented programming | ||
All of Python - loops | Describes while and for loops | ||
All of Python - conditionals and None | Deeper dive into if/else conditions and Python's None type | ||
All of Python - function arguments | Deeper dive into functions, including optional parameters | ||
All of Python - lambda functions | Introduces anaonymous functions (aka lambda functions) | ||
All of Python - recursive functions | Introduces the world of functions themselves | ||
All of Python - regexes | A very basic intro to regular expressions | ||
Intro to Numpy | Numpy quick start | A broad overview of Numpy | |
Intro to Pandas | Pandas - quick start | A broad overview of Pandas | |
Pandas - Series | A deeper dive into Pandas Series | ||
Pandas - Dataframes | Build up a dataframe using a collection of Series or a Numpy matrix, shows basic functioanality | ||
Pandas - general operations | Introduces additional dataframe operations | ||
Pandas - combining: merge, join, concat | Shows how to combine multiple dataframes, similar to SQL joins | ||
Pandas - groupby | Show how to break a population into subgroups and find aggregates for those subgroups | ||
Pandas - Index | Does a deep dive into Pandas indexes, a topic often not known to casual Pandas users | ||
Pandas - reshape, pivot, melt, stack | Shows how to convert columns to rows and back, features similar to Excel's pivot table or cub rollup analysis | ||
Pandas - operations: str, dt, apply | Shows how to apply string or date functions to Pandas series | ||
Scikit learn | Scikit Learn - method behind the madness | Describes Scikit learn's architecture and introduces pipes | |
Scikit Learn - Run saved models | Shows how to connect SKLearn models to the web (very basic) | ||
Secret lives of text files | Secret lives of text files | Describes encodings (UTF, ASCII), multi-byte characters, special characters such as \n and \t, etc. | |
How to read technical docs | How to read technical docs | ||
Basic computer archtecture | Basic computer architecture | Provides a broad overview of a CPU, registers, floating points vs integers, disk vs memory speed differences | |
Python for Analytics | First programs | First programs | Several examples of small, but full programs which use all common programming constructs and data structures |
Intro to Jupyter | Intro to Jupyter - not technical | Provides hsitorical context for Jupyter | |
Intro type Jupyter - technical | Provides a practioner specific intro to Jupyter | ||
All of Python | All of Python - faster basics | An overview of Python for computer programmers (multi-week lecture) | |
Intro to Numpy | Numpy quick start | A broad overview of Numpy | |
Intro to Pandas | Pandas - quick start | A broad overview of Pandas | |
Pandas - Series | A deeper dive into Pandas Series | ||
Pandas - Dataframes | Build up a dataframe using a collection of Series or a Numpy matrix, shows basic functioanality | ||
Pandas - general operations | Introduces additional dataframe operations | ||
Pandas - combining: merge, join, concat | Shows how to combine multiple dataframes, similar to SQL joins | ||
Pandas - groupby | Show how to break a population into subgroups and find aggregates for those subgroups | ||
Pandas - Index | Does a deep dive into Pandas indexes, a topic often not known to casual Pandas users | ||
Pandas - reshape, pivot, melt, stack | Shows how to convert columns to rows and back, features similar to Excel's pivot table or cub rollup analysis | ||
Pandas - operations: str, dt, apply | Shows how to apply string or date functions to Pandas series | ||
Scikit learn | Scikit Learn - method behind the madness | Describes Scikit learn's architecture and introduces pipes | |
Scikit Learn - Run saved models | Shows how to connect SKLearn models to the web (very basic) | ||
Programming for Analytics | Programming vs calculators | Programming vs calculator | Helps novices understand what features need to be added to a calculator to make it a fully programming environment |
First programs | First programs | Several examples of small, but full programs which use all common programming constructs and data structures | |
Intro to Jupyter | Intro to Jupyter - not technical | Provides hsitorical context for Jupyter | |
Intro type Jupyter - technical | Provides a practioner specific intro to Jupyter | ||
All of Python | All of Python - basics | An overview of Python for novice or non-programmers: teaches programming constructs | |
Secret lives of text files | Secret lives of text files | Describes encodings (UTF, ASCII), multi-byte characters, special characters such as \n and \t, etc. | |
How to read technical docs | How to read technical docs | ||
Basic computer archtecture | Basic computer architecture | Provides a broad overview of a CPU, registers, floating points vs integers, disk vs memory speed differences | |
Intro to Numpy | Numpy quick start | A broad overview of Numpy | |
Intro to Pandas | Pandas - quick start | A broad overview of Pandas | |
Lectures on R omitted |