Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update index.md #23

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
718bf00
Update index.md
AdoX13 Jul 9, 2022
62c1d53
Update index.md
AdoX13 Jul 10, 2022
cfb9348
Update index.md
AdoX13 Jul 10, 2022
b44b60d
Create number_systems.svg
AdoX13 Jul 10, 2022
6111f49
Update index.md
AdoX13 Jul 10, 2022
61e9713
Add files via upload
AdoX13 Jul 10, 2022
9ff1925
Update data_signals.svg
AdoX13 Jul 10, 2022
cfddb19
Update index.md
AdoX13 Jul 10, 2022
29be8fb
Add files via upload
AdoX13 Jul 10, 2022
28e8ad7
Update unix_file_permissions.svg
AdoX13 Jul 10, 2022
4452c3d
Update unix_file_permissions.svg
AdoX13 Jul 10, 2022
d7b04a1
Update unix_file_permissions.svg
AdoX13 Jul 10, 2022
f098440
Update index.md
AdoX13 Jul 10, 2022
e71ab2a
Update index.md
AdoX13 Jul 10, 2022
e785ef5
Update data-representation/index.md
AdoX13 Jul 10, 2022
419c185
Update data-representation/index.md
AdoX13 Jul 10, 2022
23ab3ab
Update data-representation/index.md
AdoX13 Jul 10, 2022
1089ec8
Update data-representation/index.md
AdoX13 Jul 10, 2022
c9e80b9
Update data-representation/index.md
AdoX13 Jul 10, 2022
76698ee
Update index.md
AdoX13 Jul 10, 2022
9b665c3
Add files via upload
AdoX13 Jul 10, 2022
512b5c5
Delete number_systems.svg
AdoX13 Jul 10, 2022
41591f6
Add files via upload
AdoX13 Jul 10, 2022
62b6cb5
Add files via upload
AdoX13 Jul 10, 2022
a998df5
Update index.md
AdoX13 Jul 10, 2022
cbc1738
Update data-representation/index.md
AdoX13 Jul 10, 2022
7806596
Update index.md
AdoX13 Jul 10, 2022
4b55d00
Delete unix_file_permissions.svg
AdoX13 Jul 11, 2022
aaa256d
Add files via upload
AdoX13 Jul 11, 2022
bf89c8b
Update index.md
AdoX13 Jul 11, 2022
3d8b826
Update index.md
AdoX13 Jul 11, 2022
51b0237
Update index.md
AdoX13 Jul 11, 2022
8ae2360
Update index.md
AdoX13 Jul 11, 2022
a50d91b
Update index.md
AdoX13 Jul 11, 2022
b086038
Update index.md
AdoX13 Jul 11, 2022
56afb4c
Update index.md
AdoX13 Jul 11, 2022
57b7086
Add files via upload
AdoX13 Jul 11, 2022
c04d9d5
Update index.md
AdoX13 Jul 11, 2022
b85433e
Update index.md
AdoX13 Jul 11, 2022
817fa2c
Update index.md
AdoX13 Jul 11, 2022
d39faad
Update index.md
AdoX13 Jul 11, 2022
6cada72
Update index.md
AdoX13 Jul 11, 2022
9a09350
Update index.md
AdoX13 Jul 11, 2022
14ec29a
Update index.md
AdoX13 Jul 11, 2022
fd68789
Update data-representation/index.md
AdoX13 Jul 11, 2022
3ce8e38
Create froggified.txt
AdoX13 Jul 12, 2022
7aecd25
Add files via upload
AdoX13 Jul 12, 2022
fa87217
Add files via upload
AdoX13 Jul 12, 2022
a2bac60
Add files via upload
AdoX13 Jul 12, 2022
d3542b1
Update index.md
AdoX13 Jul 12, 2022
840b9ec
Delete data-representation/activities/froggified directory
AdoX13 Jul 12, 2022
fdd8d8a
Add files via upload
AdoX13 Jul 12, 2022
75e2679
Add files via upload
AdoX13 Jul 12, 2022
1fcf7e3
Update index.md
AdoX13 Jul 12, 2022
df476da
Update data-representation/index.md
AdoX13 Jul 12, 2022
423d067
Rename data-representation/activities/ASCII Art/public/Tommy's art pr…
AdoX13 Jul 12, 2022
9376952
Rename data-representation/activities/ASCII Art/sol/sol.py to data-re…
AdoX13 Jul 12, 2022
23e3682
Rename data-representation/activities/Infinity Hashes/sol/sol.md to d…
AdoX13 Jul 12, 2022
f790d27
Update README.md
AdoX13 Jul 12, 2022
d6e6dfe
Update README.md
AdoX13 Jul 12, 2022
8828586
Rename data-representation/activities/Infinity Hashes/public/The Trut…
AdoX13 Jul 12, 2022
e4f272e
Rename data-representation/activities/Froggified/sol/sol.md to data-r…
AdoX13 Jul 12, 2022
d928877
Rename data-representation/activities/Froggified/public/froggified.tx…
AdoX13 Jul 12, 2022
70b7060
Rename data-representation/activities/Enconding Train/sol/sol.py to d…
AdoX13 Jul 12, 2022
66cc886
Rename data-representation/activities/Enconding Train/public/my encod…
AdoX13 Jul 12, 2022
6e32a2b
Update index.md
AdoX13 Jul 12, 2022
a5392cf
Update solution.py
AdoX13 Jul 12, 2022
acdf572
Update solution.py
AdoX13 Jul 12, 2022
6395446
Update index.md
AdoX13 Jul 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions data-representation/assets/binary_meme.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions data-representation/assets/data_signals.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions data-representation/assets/numeral_systems1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions data-representation/assets/numeral_systems2.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 31 additions & 0 deletions data-representation/assets/unix_file_permissions.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
128 changes: 121 additions & 7 deletions data-representation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,135 @@ Use [gh-md-toc](https://github.com/ekalinin/github-markdown-toc).

## Introduction

Objectives and rationale for the current session.
In today's session we'll discuss about data and the many ways we can represent it.

## Reminders and Prerequisites

- Information required for this section
- Commands / snippets that should be known, useful to copy-paste throughout the practical session
For this session you'll need:
- Basic knowledge of Python (as seen in the [first session](..//welcome-to-linux/))
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved
- Numbers

## Content Sections:
## What is Data?

Data is information.
This plain text is data, but more than that, data can be encoded and represented in many ways.
Generally, we represent data in a suitable format for our specific purpose.
For example, if we want the most basic way to encode data, the one that computers "think" in, we'll use `Binary`.
Sometimes, we need our data to not be confused with something else, so, for example we encode `Binary` to `Base-64` in order to get the information on the other side of the wire uncorrupted.

## Data Formats

### Numeral Systems
Numeral Systems are a method of representing numbers by mathematical combinations of symbols.

![Numeral Systems 1](./assets/numeral_systems1.svg)

Humans prefer the `decimal` numeral system (also know as Base-10), since it provides better readability, hence the software of computers is mostly `Base-10`, as humans write software far more than they build hardware.
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

Computers, on the other hand, use binary (or Base-2), the numerical system that uses two digits (0 and 1), which are also known as `bits` and `bytes` (1 byte = 8 bits)
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

![Binary Meme](./assets/binary_meme.svg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


But why?

Hardware prefers them, since they are associated easier with electrical signals:

![Data Signals](./assets/data_signals.svg)

Of course, there is also an in-between: Hexadecimal.
Also known as Base-16, it uses 10 digits (0-9) and 6 alphabet letters (A-F).

Hexadecimal data is both readable and tightly correlated to the binary representation.

Let's say we have `0b10101001` (`10101001`).

I assume you can safely say that since we have 8 bits, it will be `< 256`.

Its hexadecimal form is `0xa9` (`a9`).

Thus, if we want to convert it to `decimal`, instead of doing 8 steps:

$(1 × 2^7) + (0 × 2^6) + (1 × 2^5) + (0 × 2^4) + (1 × 2^3) + (0 × 2^2) + (0 × 2^1) + (1 × 2^0) = 169$

we only do 2 steps:

$(a × 16^1) + (9 × 16^0) = 169$

TODO(Cod Python)
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

#### `Octal`
Octal or Base-8 uses 8 digits (0-7). It is the least popular of the aforementioned 4, but an interesting use of it is in the Unix File Permissions system:

![Unix File Permissions](./assets/unix_file_permissions.svg)

AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

An overview of the presented `Numeral Systems`:
![Numeral Systems 2](./assets/numeral_systems2.svg)

### Character Encoding

#### `ASCII`

ASCII (American Standard Code for Information Interchange):
Going from 0 - 127

```
DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII DEC HEX ASCII
0 00 NUL 26 1A SUB 52 34 4 78 4E N 104 68 h
1 01 SOH 27 1B ESC 53 35 5 79 4F O 105 69 i
2 02 STX 28 1C FS 54 36 6 80 50 P 106 6A j
3 03 ETX 29 1D GS 55 37 7 81 51 Q 107 6B k
4 04 EOT 30 1E RS 56 38 8 82 52 R 108 6C l
5 05 ENQ 31 1F US 57 39 9 83 53 S 109 6D m
6 06 ACK 32 20 SPACE 58 3A : 84 54 T 110 6E n
7 07 BEL 33 21 ! 59 3B ; 85 55 U 111 6F o
8 08 BS 34 22 " 60 3C < 86 56 V 112 70 p
9 09 HT 35 23 # 61 3D = 87 57 W 113 71 q
10 0A LF 36 24 $ 62 3E > 88 58 X 114 72 r
11 0B VT 37 25 % 63 3F ? 89 59 Y 115 73 s
12 0C FF 38 26 & 64 40 @ 90 5A Z 116 74 t
13 0D CR 39 27 ' 65 41 A 91 5B [ 117 75 u
14 0E SO 40 28 ( 66 42 B 92 5C \ 118 76 v
15 0F SI 41 29 ) 67 43 C 93 5D ] 119 77 w
16 10 DLE 42 2A * 68 44 D 94 5E ^ 120 78 x
17 11 DC1 43 2B + 69 45 E 95 5F _ 121 79 y
18 12 DC2 44 2C , 70 46 F 96 60 ` 122 7A z
19 13 DC3 45 2D - 71 47 G 97 61 a 123 7B {
20 14 DC4 46 2E . 72 48 H 98 62 b 124 7C |
21 15 NAK 47 2F / 73 49 I 99 63 c 125 7D }
22 16 SYN 48 30 0 74 4A J 100 64 d 126 7E ~
23 17 ETB 49 31 1 75 4B K 101 65 e 127 7F
24 18 CAN 50 32 2 76 4C L 102 66 f
25 19 EM 51 33 3 77 4D M 103 67 g
```
teodutu marked this conversation as resolved.
Show resolved Hide resolved

In terms of storage efficiency, we can encode

`UTF-8` for ASCII text (English and other Western languages)

`UTF-16` for non-ASCII text (Chinese and other Asian languages)
teodutu marked this conversation as resolved.
Show resolved Hide resolved
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

#### `Base64`
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved

Base64 is a way of representing binary data in sequences of 24 bits that can be represented by 4 Base64 digits.

SGVsbG8gZnJvbSB0aGUgRWFydGgtNjQgIQ==

teodutu marked this conversation as resolved.
Show resolved Hide resolved

## Data Manipulation

We can manually change the way data is represented, so that it will be easier to read or structure.

For large chunks of data, this could take a long time, so, in order for us to be efficient, we will have to use certain automated ways of manipulating the information.

Tools that can help us achieve this goal are ranging from programming and scripting languages, to programs like MS Excel.

For this session's purpose, we will mainly focus on Python as already seen in [Session 1](../welcome-to-linux/)

- Content split in sections, according to session specifics
- Demos will be part of the session presentation and will be referenced (snippets, images, links) in the content

## Summary

- Sumamrizing session concepts
- Summarizing session concepts
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved
- Summarizing commands / snippets that are useful for tutorials, challenges (easy reference, copy-paste)

## Activities
AdoX13 marked this conversation as resolved.
Show resolved Hide resolved
Expand Down