From 2f13d65bd488d53730dd4fd16053d9809bfb4f2c Mon Sep 17 00:00:00 2001
From: Serdar Tumgoren <zstumgoren@gmail.com>
Date: Mon, 15 Apr 2024 09:16:18 -0700
Subject: [PATCH] Add Day 5

---
 lessons/README.md | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/lessons/README.md b/lessons/README.md
index 4422f61..10732ca 100644
--- a/lessons/README.md
+++ b/lessons/README.md
@@ -65,8 +65,26 @@ Once the repo is opened locally in VS Code, navigate to `content/web_scraping/RE
   - [Dissect the website][] and craft a scraping strategy. Add your proposed strategy to the GitHub issue for the site
   - **Once your scraping strategy is approved**, begin implementing the code on a fork of the `clean-scraper` repo, per the [Contributor Guidelines][]
 - Homework:
-  - Quiz on APIs/Web Scraping
-  - Build a `clean-scraper`
 
+## Week 3
+
+### Day 5 - CLEAN Scraping
+
+Guided tour of the [clean-scraper][] code repository, including:
+
+- Code architecture:
+  ```
+  cli -> runner -> San Diego PD scraper -> cache.download
+  ```
+- Code conventions: 
+  - `scrape_meta` stores file artifacts in cache and produces a JSON metadata file
+  - `scrape` reads JSON metadata and downloads files (to cache)
+- *Scraping at scale, with a paper trail.* - aka why the complexity?
+- Contributor Guidelines
+  - Claim an agency by filing a GitHub Issue
+  - Dissect your website and add proposed scraping plan to GH Issue
+- Start writing your scraper
+
+[clean-scraper]: https://github.com/biglocalnews/clean-scraper
 [Dissect the website]: https://stanfordjournalism.github.io/data-journalism-notebooks/lab/index.html?path=web_scraping%2Fdissecting_websites.ipynb
 [Contributor Guidelines]: https://github.com/biglocalnews/clean-scraper/blob/main/docs/contributing.md
\ No newline at end of file