Skip to content

Commit dfac617

Browse files
authored
Update data.md
1 parent 2edc7fe commit dfac617

File tree

1 file changed

+115
-4
lines changed

1 file changed

+115
-4
lines changed

source/Unit 2/data.md

Lines changed: 115 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -256,8 +256,10 @@ for bv in binary_values:
256256

257257
**3. Data Compression using Huffman Encoding**
258258

259-
**Objective:** Implement basic data compression using Huffman encoding.
260-
* **Concepts Covered:** Data Compression, Encoding, Efficiency.
259+
**Goals:**
260+
* Understand how Huffman coding compresses data
261+
* Build a Huffman tree physically with peers
262+
* Optionally write or trace a small piece of code
261263

262264
**Task:**
263265
* Implement Huffman coding, a common algorithm for lossless data compression. In this part, you’ll write a program to:
@@ -267,7 +269,110 @@ for bv in binary_values:
267269
3. Generate the Huffman codes for each character.
268270
4. Compress the text using the generated Huffman codes.
269271

270-
**Python Example:**
272+
Words for Huffman Coding Activity:
273+
274+
| Word Bank | |
275+
| --------------- | --------------- |
276+
| **SASSAFRAS** | **CONNECTICUT** |
277+
| **TENNESSEE** | **COMMITTEE** |
278+
| **SUCCESS** | **ILLINOIS** |
279+
| **BALLOON** | **KENTUCKY** |
280+
| **ASSESS** | **ALABAMA** |
281+
| **BANANA** | **MINNESOTA** |
282+
| **BOOKKEEPER** | **PENNSYLVANIA** |
283+
| **PEPPERCORN** | **TATTOO** |
284+
285+
**Part 1: Physical Huffman Coding Activity**
286+
287+
**Setup:**
288+
289+
* Use the following string: `"MISSISSIPPI"`
290+
* Create a **frequency chart**:
291+
292+
| Letter | Frequency |
293+
| ------ | --------- |
294+
| M | 1 |
295+
| I | 4 |
296+
| S | 4 |
297+
| P | 2 |
298+
299+
**Step 1: Make Cards**
300+
301+
Make index cards (or slips of paper) for each **letter with its frequency**.
302+
303+
Example:
304+
305+
```
306+
[M:1] [I:4] [S:4] [P:2]
307+
```
308+
309+
**Step 2: Build the Tree (Greedy Step-by-Step)**
310+
311+
Each "node" will be a group of cards.
312+
313+
1. Find **two lowest frequency nodes** and combine them into a new node.
314+
2. The new node’s frequency is the sum.
315+
3. Label the left branch as `0` and the right branch as `1`.
316+
317+
Repeat until only one node (the full tree) is left.
318+
319+
320+
**Example Tree for "MISSISSIPPI":**
321+
322+
```
323+
[11]
324+
/ \
325+
[4] [7]
326+
(I) / \
327+
[3] (S:4)
328+
/ \
329+
(M:1) (P:2)
330+
```
331+
332+
**Step 3: Build the Huffman Codes**
333+
334+
Trace paths from root to each letter:
335+
336+
* M: `1100`
337+
* P: `1101`
338+
* I: `0`
339+
* S: `10`
340+
341+
Encode the full word `"MISSISSIPPI"` using these bits.
342+
343+
---
344+
345+
**Part 2: Programming Extension**
346+
347+
Part 1: Code Tracing
348+
349+
Give them Python code that builds a tree and ask:
350+
351+
* What’s the Huffman code for "S"?
352+
* Which node will combine first?
353+
354+
Part 2: Partial Implementation
355+
356+
With the list of frequencies you physically created, write code that count character frequencies:
357+
358+
```python
359+
def count_frequencies(text):
360+
freq = {}
361+
for ch in text:
362+
if ch in freq:
363+
freq[ch] += 1
364+
else:
365+
freq[ch] = 1
366+
return freq
367+
368+
print(count_frequencies("MISSISSIPPI"))
369+
```
370+
371+
In Groups of 2 or 3 choose 2 more words and repeat the steps.
372+
373+
374+
<details><Summary>Simplified Huffman Coding Example</Summary>
375+
271376

272377
```python
273378
import heapq
@@ -323,9 +428,16 @@ huffman_codes = generate_codes(huffman_tree)
323428
print("Huffman Codes:", huffman_codes)
324429
```
325430

431+
326432
**Expected Output:**
327433
The Huffman codes for the characters will be printed, which will be shorter for frequently occurring characters and longer for less frequent ones.
328434

435+
436+
</details>
437+
438+
439+
440+
329441
---
330442

331443
**4. Extracting Information from Data**
@@ -372,7 +484,6 @@ Most common character: e (112 occurrences)
372484
**5. Putting it All Together: A Simple File Communication System**
373485

374486
**Objective:** Create a small system that combines all of the previous concepts to compress and decompress a file.
375-
* **Concepts Covered:** Binary Data, ASCII, Data Compression, Extraction.
376487

377488
**Task:**
378489
* Implement a program that:

0 commit comments

Comments
 (0)