Skip to content

Commit

Permalink
Merge pull request #148 from afeld/replace-hw1
Browse files Browse the repository at this point in the history
make HW1 open-ended
  • Loading branch information
afeld authored Jan 18, 2025
2 parents aad8d92 + 07abaf6 commit b02e9b0
Show file tree
Hide file tree
Showing 8 changed files with 127 additions and 309 deletions.
1 change: 0 additions & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ jobs:
matrix:
notebook:
- hw_0.ipynb
- hw_1.ipynb
- hw_2.ipynb
- hw_3.ipynb
- hw_4.ipynb
Expand Down
40 changes: 40 additions & 0 deletions assignments.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,46 @@ That is now your own copy; make edits in there directly.
- [Ed]({{discussions_url}})
- [Office hours](https://python-public-policy.afeld.me/en/{{school_slug}}/syllabus.html#instructor-information)

## Open-ended assignments

_[Homework 1](hw_1.md) and the [Final Project](final_project.md)_

### Open data portals

There are countless places to get data, notably:

- Local:
- [NYC Open Data](https://opendata.cityofnewyork.us/)
- [Scout](https://scout.tsdataclinic.com/explore/NYC) can be used to find datasets with certain columns
- [BetaNYC](https://data.beta.nyc/)
- U.S. Federal:
- [data.gov](https://www.data.gov/)
- [Census Bureau](https://data.census.gov/)
- [Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/)
- [United Nations](https://data.un.org/)
- [World Bank](https://data.worldbank.org/)
- [World Health Organization (WHO)](https://www.who.int/data)
- [Economic Policy Institute](https://www.epi.org/data/)
- [Kaggle](https://www.kaggle.com/datasets)
- [Google Dataset Search](https://datasetsearch.research.google.com/)
- [Black Wealth Data](https://blackwealthdata.org/)
- Lists of open data portals:
- [DataPortals](https://dataportals.org/)
- [Open Data Network](https://www.opendatanetwork.com/)

### Inspiration

For starters, see the [Final Project examples from past semesters](final_project/examples.md).

Probably not realistic to make visualizations that are as fancy as these ones made by professionals, but they may give you ideas. Some also include links/downloads of the source data.

- [FiveThirtyEight Interactives](https://projects.fivethirtyeight.com/)
- [The Guardian Visual Journalism](https://www.theguardian.com/interactive)
- [New York Times Graphics](https://www.nytimes.com/spotlight/graphics)
- [Our World in Data](https://ourworldindata.org/)
- [ProPublica News Apps](https://www.propublica.org/newsapps/)
- [Visual Capitalist](https://www.visualcapitalist.com/)

### Storing data

{% if id == "columbia" -%}
Expand Down
2 changes: 1 addition & 1 deletion final_project/proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Process

1. [Find a dataset](resources.md#open-data-portals) that seems interesting.
1. [Find a dataset](../assignments.md#open-data-portals) that seems interesting.
- Use at least one dataset that you aren't familiar with.
- Using data from a primary source is preferred.
- To meet the [requirement](../final_project.md#analysis-requirements) that your project "not be trivial," you probably want a dataset that's large enough that you can't understand it at a glance. In other words, you probably want it to have 500+ rows.
Expand Down
36 changes: 1 addition & 35 deletions final_project/resources.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,6 @@
# [Final Project](../final_project.md) resources

## Open data portals

There are countless places to get data, notably:

- Local:
- [NYC Open Data](https://opendata.cityofnewyork.us/)
- [Scout](https://scout.tsdataclinic.com/explore/NYC) can be used to find datasets with certain columns
- [BetaNYC](https://data.beta.nyc/)
- U.S. Federal:
- [data.gov](https://www.data.gov/)
- [Census Bureau](https://data.census.gov/)
- [Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/)
- [United Nations](https://data.un.org/)
- [World Bank](https://data.worldbank.org/)
- [World Health Organization (WHO)](https://www.who.int/data)
- [Economic Policy Institute](https://www.epi.org/data/)
- [Kaggle](https://www.kaggle.com/datasets)
- [Google Dataset Search](https://datasetsearch.research.google.com/)
- [Black Wealth Data](https://blackwealthdata.org/)
- Lists of open data portals:
- [DataPortals](https://dataportals.org/)
- [Open Data Network](https://www.opendatanetwork.com/)

## Inspiration

For starters, see the [examples from past semesters](examples.md).

Probably not realistic to make visualizations that are as fancy as these ones made by professionals, but they may give you ideas. Some also include links/downloads of the source data.

- [FiveThirtyEight Interactives](https://projects.fivethirtyeight.com/)
- [The Guardian Visual Journalism](https://www.theguardian.com/interactive)
- [New York Times Graphics](https://www.nytimes.com/spotlight/graphics)
- [Our World in Data](https://ourworldindata.org/)
- [ProPublica News Apps](https://www.propublica.org/newsapps/)
- [Visual Capitalist](https://www.visualcapitalist.com/)
Seee also: [open-ended assignment information](../assignments.md#open-ended-assignments)

## Counting lines of code

Expand Down
203 changes: 0 additions & 203 deletions hw_1.ipynb

This file was deleted.

51 changes: 51 additions & 0 deletions hw_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Homework 1

_[General assignment information](assignments.md)_

## Coding

1. [Find a dataset.](assignments.md#open-data-portals)
- It must have:
- At least one numeric column
- Between one thousand and one million rows
- If it's larger than that, you can filter it down.
- Don't spend too long on this step.
1. If there's more than one numeric column, pick one.
1. Create a new notebook.
1. Using pandas:
1. Read in the data.
1. Compute:
- The mean
- The median
- The mode
1. Do a `groupby()` with an [aggregation](https://pandas.pydata.org/docs/user_guide/groupby.html#aggregation).

Now [turn in the assignment](assignments.md).

## Tutorials

1. Read [The Joys (and Woes) of the Craft of Software Engineering](https://cs.calvin.edu/courses/cs/262/kvlinden/references/brooksJoysAndWoes.html)
- Note not _everything_ in there is applicable to data analysis
1. Filtering/indexing `DataFrame`s
- [Filter specific rows from a `DataFrame`](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/03_subset_data.html#how-do-i-filter-specific-rows-from-a-dataframe)
- [Boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)
1. Learn about functions
- [Video](https://www.youtube.com/watch?v=9Os0o3wzS_I&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7&index=8)
- [Blog post](https://python.land/introduction-to-python/functions)
1. Coding Style Guides - Please skim these; I don't expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.
- [The Hitchhiker’s Guide to Python](https://docs.python-guide.org/writing/style/)
- [PEP 8](https://www.python.org/dev/peps/pep-0008/)
1. [Guide to commenting your code](https://realpython.com/python-comments-guide/)
1. [Quartz Guide to Bad Data](https://github.com/Quartz/bad-data-guide#readme)

### Optional

- [Learn about data dictionaries](https://analystanswers.com/what-is-a-data-dictionary-a-simple-thorough-overview/)
- Glance through pandas' [comparison with other tools](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/index.html) for any you are familiar with
- More on indexing:
- [How to Select Rows from Pandas DataFrame](https://datatofish.com/select-rows-pandas-dataframe/)
- Selecting Subsets of Data in Pandas: [Part 1](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c) and [Part 2](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-39e811c81a0c)

## Participation

Reminder about the [between-class participation requirement](syllabus.md#participation).
Loading

0 comments on commit b02e9b0

Please sign in to comment.