-
Notifications
You must be signed in to change notification settings - Fork 7
/
technicalfundamentals.tex
159 lines (136 loc) · 10.6 KB
/
technicalfundamentals.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
\chapter{Technical Fundamentals}
\label{ch:technical-fundamentals}
\begin{summary}
This chapter aims to provide some basic technical computer science background required for later material. It is of particular importance if you want to delve deeper and read the code of the Python \keyword{bitcoin-utils} library\footnote{https://github.com/karask/python-bitcoin-utils}. It aims to concisely explain some fundamental concepts by providing examples.
\end{summary}
\section{Bytes, Hex, Endianness and Encodings}
Computers internally use the binary numeral system that consists of only two symbols: 0 and 1. A binary digit, or bit, is the basic unit of binary. Eventually, everything is represented in bits.
For ease of processing, bits are aggregated into bytes. Each byte consists of 8 bits. Thus, \keyword{01000001} represents a single byte. It is difficult (and much longer) to use/type binary, thus we typically use the hexadecimal numeral system or hex to represent bytes in a more human-friendly way. Hex has 16 possible symbols from 0 to F (0-9 and A-F). To represent 16 symbols we need exactly 4 bits (0 is 0000, 1 is 0001, …, F is 1111) and thus each byte can be represented by two hex digits. For example, \keyword{01000001} is equivalent to \keyword{41} in hexadecimal (0100 is hex number 4 and 0001 is hex number 1). You can experiment with conversions between binary, hexadecimal and decimal using various online tools\footnote{https://www.rapidtables.com/convert/number/binary-to-hex.html}.
\vspace{1em}
\begin{lstlisting}[style=Python,label={lst:encodings-1},caption={Python examples},captionpos=b]
>>> format(0, '04b') # converts decimal 0 to binary with 0 padding for 4 bits
'0000'
>>> format(15, '04b') # same for decimal 15
'1111'
>>> format(15, 'X') # convert decimal 15 to hex string
'F'
>>> format(0x41, '08b') # converts hex number 41 to binary with 0 padding for
# 8 bits
'01000001'
>>> 0x41 # converts hex number 41 to decimal
65
>>> 0x41 == '41' # compares hex number 41 with string 41
False
>>> b'41' == '41' # compares byte literal 41 with string 41
False
\end{lstlisting}
\vspace{1em}
One byte can represent up to $2^8=256$ numbers (0-255). Several bytes are needed to represent larger numbers. For example, decimal number 1000 is \keyword{1111101000} in binary which requires 10 bits. Computers operate at the byte level and thus 2 bytes (or 16 bits) will be required to represent this number; binary: \keyword{0000001111101000} and hex: \keyword{03E8}. Note that the byte ordering is important and it is called endianness\footnote{https://en.wikipedia.org/wiki/Endianness}. The above example uses big-endian ordering, where the most significant byte comes first and the least significant byte comes last. This is the same way we order numbers in languages (in left-to-right scripts\footnote{https://en.wikipedia.org/wiki/Writing\_system\#Directionality}). In little-endian the same number would be represented as \keyword{1110100000000011} and \keyword{E803}.
\vspace{1em}
\begin{lstlisting}[style=Python,label={lst:encodings-2},caption={Python examples},captionpos=b]
>>> import binascii
>>> format(0x03E8, '016b') # convert hex number 03E8 to binary digits
# with 0 padding for 16 bits
'0000001111101000'
>>> format(int('03E8', 16), '016b') # convert hex string '03E8' to binary digits
# as above
'0000001111101000'
>>> binascii.unhexlify('03E8') # convert hex string 03E8 to binary
b'\x03\xe8'
>>> b'\x03\xe8'[::-1] # reverse bytes to change endianness
b'\xe8\x03'
\end{lstlisting}
\vspace{1em}
\begin{note}
Internally Bitcoin uses little-endian byte order as it improves speed (most computers use little-endian byte ordering internally). Most hash function libraries (see next section) create hashes using big-endian and Bitcoin transmits those in that ordering. However, when hashes are displayed Bitcoin uses little-endian order! The latter might be because it treats them as integers to compare them faster.
\end{note}
As we already mentioned computers only know about binary. To display these numbers for human consumption we need to convert them into characters (i.e. text). In order to accomplish that we need character encodings that provide mappings between bit sequences and characters. Examples of such encodings are ASCII and UTF-8. UTF-8 is widely used nowadays and it provides an 8-bit mapping between (binary) numbers and characters. For example, character \keyword{A} is mapped to \keyword{01000001}. You can experiment with such encodings with online tools\footnote{https://www.rapidtables.com/convert/number/ascii-to-binary.html}.
\vspace{1em}
\begin{lstlisting}[style=Python,label={lst:encodings-3},caption={Python examples},captionpos=b]
>>> import binasci
>>> 'A' == b'A' # A character is not a byte
False
>>> b'41' == '41'.encode('utf-8') # Unicode string literals are stored internally
# in binary for efficiency
True
>>> 'ε'.encode() # UTF-8 (the default) is a variable lenth
# encoding - 'ε' is stored as two bytes
b'\xce\xb5'
>>> '41'.encode() # converts UTF-8 string literal 41 to binary -
# '4' and '1' occupy 1 byte each
b'41'
>>> binascii.unhexlify('41') # converts string hex value 41 to binary -
# the UTF-8 value for 'A' is 0x41
b'A'
>>> b'A' == b'\x41' # the bytes from 0x01 to 0x7f are confusingly
# specified with UTF-8 characters
True
>>> b'41' == b'\x34\x31' # similarly for binary characters b'4' and
# b'1' or b'41'
True
>>> b'A'.decode('utf-8') # convert binary to characters for displaying
# according to UTF-8 encoding
'A'
>>> binascii.hexlify(b'A') # convert binary value to hex value (expressed
# in binary)
b'41'
>>> b'41'.decode() # decode to get as string (UTF-8 is the default)
'41'
>>> b'\xce\xb5'.decode() # convert from bytes to UTF-8 characters (0xce
# 0xb5 maps to 'ε')
'ε'
\end{lstlisting}
\vspace{1em}
\section{Cryptographic Hash Functions}
A cryptographic hash function is a hash function that is suitable for cryptography. It is an one-way function that takes an arbitrary block of data and returns a fixed-size bit array, that is called the hash value or digest or digital fingerprint or just hash. It has the following properties:
\begin{itemize}
\item it is deterministic, i.e. the same block of data always returns the same hash
\item it is quick to compute
\item it is an one-way function; given the hash one cannot derive the original value unless they brute-force all possible values (which is close to impossible for large data sets)
\item even a trivial change to the original data will change the resulting hash completely (it will appear uncorrelated to the previous hash)
\item it is collision resistant; it is computational infeasible to find two different blocks of data with the same hash value
\end{itemize}
\vspace{1em}
\begin{lstlisting}[style=Python,label={lst:encodings-4},caption={Python examples},captionpos=b]
>>> import hashlib
>>> import binascii
>>> b1 = 'Bitcoin'.encode('utf-8') # convert string to bytes according to UTF-8
# encoding
>>> b1
b'Bitcoin'
>>> h1 = hashlib.sha256(b1).digest() # calculate the hash (or digest) of b1
>>> h1
b'\xb4\x05m\xf6i\x1f\x8d\xc7.V0-\xda\xd3E\xd6_\xea\xd3\xea\xd9)\x96\t\xa8&\xe24N' \
b'\xb6:\xa4'
>>> binascii.hexlify(h1).decode() # converts bytes to hex (expressed in binary)
# and then to string to display
'b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4'
>>> b2 = 'bitcoin'.encode('utf-8') # convert string to bytes according to UTF-8
#encoding
>>> b2
b'bitcoin'
>>> h2 = hashlib.sha256(b2).digest() # calculate the hash (or digest) of b2
>>> h2
b'k\x88\xc0\x87$z\xa2\xf0~\xe1\xc5\x95k\x8e\x1a\x9fL\x7f\x89*p\xe3$\xf1\xbb=\x16' \
b'\x1e\x05\xca\x10{'
>>> binascii.hexlify(h2).decode() # converts bytes to hex (expressed in binary)
# and then to string to display
'6b88c087247aa2f07ee1c5956b8e1a9f4c7f892a70e324f1bb3d161e05ca107b'
\end{lstlisting}
\vspace{1em}
Cryptographic hash functions are very important in information security systems. They are used in digital signatures, message authentication codes and as ordinary (but more secure) hash functions to index data in hash tables, to uniquely identify files (bittorrent, IPFS), as checksums to detect accidental (or not) corruption of data, etc.
\begin{note}
Bitcoin is using two hashing functions: SHA-256 and RIPEMD-160 which create a hash value of 256 and 160 bits respectively (or 32 and 20 bytes or 64 and 40 hex characters).
\end{note}
\section{Asymmetric Cryptography}
Asymmetric cryptography or public key cryptography\footnote{https://en.wikipedia.org/wiki/Public-key\_cryptography} is a cryptographic system that uses pairs of keys with a specific mathematical relation. In each pair there is a private key that should remain private and a public key that can be freely shared. Between two participants this allows:
\begin{itemize}
\item Encryption: Alice can encrypt a message with Bob’s public key and send it to Bob. Only the owner of the corresponding private key can decrypt and view the message.
\item Authentication / Digital Signatures: Alice can sign a message using her private key and send it to Bob. Anyone can view the contents and verify the signature using Alice’s public key, thus ensuring that it was indeed Alice that send the message.
\item Integrity: While anyone can view the contents of a signed message no one can modify it since the signature will be invalidated.
\item Non-Repudiation: Signing or encrypting a message cannot be refuted by the author once the message has been sent, assuming the private key is secure.
\end{itemize}
\begin{note}
Bitcoin does not use encryption at all. Digital signatures are used to sign transactions in order to authenticate that you are the owner of the coins you wish to transfer. Integrity and non-repudiation also apply to transaction signing.
\end{note}
There are several different algorithms for asymmetric cryptography, like RSA and ECDSA. ECDSA, which Bitcoin uses, has the property that a private key can be used to calculate the corresponding public key.