diff --git a/easy/main.typ b/easy/main.typ index 1c0fa47..d080880 100644 --- a/easy/main.typ +++ b/easy/main.typ @@ -5,7 +5,7 @@ } #let part(s) = { pagebreak(weak: true) - set text(fill: rgb("#002299")) + //set text(fill: rgb("#002299")) heading(offset: 0, s) } @@ -18,8 +18,9 @@ #quote[ I can now prove to you that I have a message $M$ such that - $op("SHA")(M) = "0xa91af3ac..."$, without revealing $M$. - But not just for SHA. I can do this for any function you want. + $sha(M) = "0xa91af3ac..."$, without revealing $M$. + But not just for the hash function sha. + I can do this for any function you want. ] #toc diff --git a/easy/src/2pc-takeaways.typ b/easy/src/2pc-takeaways.typ index 2588a87..6805978 100644 --- a/easy/src/2pc-takeaways.typ +++ b/easy/src/2pc-takeaways.typ @@ -7,13 +7,13 @@ function over their respective secret inputs. We can think of this as your prototypical _2PC_ (two-party computation). 2. The main ingredient of a garbled circuit is _garbled gates_, - which area gates whose functionality is hidden. This can be done e.g. + which area gates whose functionality is hidden. This can be done by Alice precomputing different outputs of the garbled circuit based on all possible inputs of Bob, and then letting Bob pick one. 3. Bob "picks an input" with the technique of _oblivious transfer (OT)_. This can be built in various ways, including with commutative encryption or public-key cryptography. -4. More generally, this means in theory a group of people can +4. More generally, it is also possible for a group of people to compute whatever secret function they want, which is the field of _multiparty computation (MPC)_. ] diff --git a/easy/src/ec.typ b/easy/src/ec.typ index ab3b5b7..a18a7cc 100644 --- a/easy/src/ec.typ +++ b/easy/src/ec.typ @@ -328,7 +328,7 @@ for the prime $p := 2^(255)-19$. Its order is $8$ times a large prime $ q' := 2^(252) + 27742317777372353535851937790883648493. $ In that case, to generate a random point on Curve25519 with order $q'$, -one will usually take a random point in it and multiply it by $8$. +one will usually take a random point on the curve and multiply it by $8$. BN254 is also engineered to have a property called _pairing-friendly_, which is defined in @pairing-friendly when we need it later. @@ -372,7 +372,7 @@ given her published public key $[d]$. 1. Alice picks a random scalar $r in FF_q$ (keeping this secret) and publishes $[r] in E$. 2. Alice generates a number $n in FF_q$ by hashing $msg$ with all public information, - say $ n := sha([r], msg, [d]). $ + say $ n := hash([r], msg, [d]). $ 3. Alice publishes the integer $ s := (r + d n) mod q. $ In other words, the signature is the ordered pair $([r], s)$. @@ -394,7 +394,7 @@ The number $r$ is called a _blinding factor_ because its use prevents Bob from stealing Alice's secret key $d$ from the published $s$. It's therefore imperative that $r$ isn't known to Bob nor reused between signatures, and so on. -One way to do this would be to pick $r = sha(d, msg)$; this has the +One way to do this would be to pick $r = hash(d, msg)$; this has the bonus that it's deterministic as a function of the message and signer. In @kzg we will use ideas quite similar to this to diff --git a/easy/src/fhe-takeaways.typ b/easy/src/fhe-takeaways.typ index f2dbaaf..266477d 100644 --- a/easy/src/fhe-takeaways.typ +++ b/easy/src/fhe-takeaways.typ @@ -4,7 +4,8 @@ #green[ 1. A _fully homomorphic encryption_ protocol allows Alice to delegate Bob to compute some function $f(x)$ for Alice in a way that Bob doesn't get to know $x$. -2. The hard problem backing known FHE protocols is the _learning with errors (LWE)_ problem, which comes down to deciding if a system of "approximate equations" over $F_q$ are consistent. -3. The main idea of this approach to FHEs is to use approximate eigenvalues as the encrypted computation and an "approximate eigenvector" as the secret key. Intuitively, adding and multiplying two matrices with different approximate eigenvalues for the same eigenvector approximately adds and multiplies the eigenvalues, respectively. +2. The hard problem backing known FHE protocols is the _learning with errors (LWE)_ problem, which comes down to deciding if a system of "approximate equations" over $F_q$ is consistent. +3. The main idea of this approach to FHEs is to use "approximate eigenvalues" as the encrypted computation and an "approximate eigenvector" as the secret key. + Intuitively, adding and multiplying two matrices with different approximate eigenvalues for the same eigenvector approximately adds and multiplies the eigenvalues, respectively. 4. To carefully do this, we actually need to control the error blowup with the _flatten_ operation. This creates a _leveled FHE_ protocol. ] diff --git a/easy/src/fhe2.typ b/easy/src/fhe2.typ index 2cd579b..987c020 100644 --- a/easy/src/fhe2.typ +++ b/easy/src/fhe2.typ @@ -147,8 +147,8 @@ computes $ upright(bold(x)) dot.op upright(bold(a)) = 4 . $ Plugging in $y = 1$, we see that $ 4 + epsilon.alt = 1 + m . $ Now it’s a simple "rounding" problem. We know that $epsilon.alt$ is -small and positive, so $1 + m$ is either $4$ or … a little more (In -fact, it’s one of $4 , 5 , 6 , 7 , 8$.) On the other hand, since $m$ is +small and positive, so $1 + m$ is either $4$ or … a little more. +(In fact, it’s one of $4 , 5 , 6 , 7 , 8$.) On the other hand, since $m$ is 0 or 5, $1 + m$ had better be 1 or 6, so the only possibility is that $m = 5$ (so $1+m = 6$). diff --git a/easy/src/fhe3.typ b/easy/src/fhe3.typ index 488950f..c2475e8 100644 --- a/easy/src/fhe3.typ +++ b/easy/src/fhe3.typ @@ -161,7 +161,7 @@ bigger, say $n approx r log q$, to get the same level of security. Now let’s compute more carefully what happens to the error when we add, negate, and multiply bits. Suppose $ C_1 upright(bold(v)) = mu_1 upright(bold(v)) + epsilon.alt_1 , $ where -$epsilon.alt_1$ is some vector with all its entries upper bounded by some +$epsilon.alt_1$ is some vector with all its entries bounded by some $B$. (And similarly for $C_2$ and $mu_2$.) When we add two ciphertexts, the errors add: @@ -207,4 +207,4 @@ If we need to evaluate a bigger circuit, we have two options: + Use some technique to "reset" the error and start anew, as if with a freshly encrypted ciphertext. This approach is called _bootstrapping_ and it incurs some hefty computational costs. - But for very, very large circuits, it's the only viable option. Bootstrapping is beyond the scope of this book. + But for large circuits, it's the only viable option. Bootstrapping is beyond the scope of this book. diff --git a/easy/src/fs.typ b/easy/src/fs.typ index bf5b33d..6c2407c 100644 --- a/easy/src/fs.typ +++ b/easy/src/fs.typ @@ -62,10 +62,10 @@ Fiat--Shamir turns it into the following noninteractive protocol. known to both Peggy and Victor. 1. Peggy sends $Com(F)$ and $Com(H)$. - 2. Peggy computes $lambda in FF_q$ by $lambda = sha(Com(F), Com(H))$. + 2. Peggy computes $lambda in FF_q$ by $lambda = hash(Com(F), Com(H))$. 3. Peggy opens both $Com(F)$ and $Com(H)$ at $lambda$. 4. Victor verifies that - $lambda = sha(Com(F), Com(H))$ and $F(lambda) = Z(lambda) H(lambda)$. + $lambda = hash(Com(F), Com(H))$ and $F(lambda) = Z(lambda) H(lambda)$. ] We can apply the Fiat--Shamir heuristic to the full PLONK protocol. diff --git a/easy/src/intro.typ b/easy/src/intro.typ index 6ad513b..bfba06c 100644 --- a/easy/src/intro.typ +++ b/easy/src/intro.typ @@ -8,12 +8,21 @@ Cryptography is so ubiquitous that it has become invisible: - _Encryption_ (hiding and then decoding messages) make people talking to each other over apps and computers talking to each other over protocols (like SSH) secure. -- _Digital signatures_ (signing a message with some data that anyone can verify must come from some specific identity) authenticates people's identity, so you know that the website you are going to is actually what it says it is. -- _Key exchanges_ (allowing two parties to agree on a secret piece of data, even talking over an public channel) allows people to set up instructure remotely to do other cryptography, such as faster encryption algorithms. - -However, there is actually a lot more cryptography that have been implemented in academic and other smaller circles, such as #cite("https://w.wiki/9fXW", "group signature schemes") (more advanced versions of digital signatures supporting multiple participants) or commitment schemes (general methods to commit to some secret that is to be revealed later in a way that prevents cheating). - -Even beyond this, there is cryptography that have been theoretically constructed but barely (or never) tried in practice, often with a ambitious sense of scale. Their spirit can be summarized as: +- _Digital signatures_ + (signing a message with some data that anyone can verify must come from some specific identity) + authenticate people's identity, so you know that the website you are going to is actually what it says it is. +- _Key exchanges_ (allowing two parties to agree on a secret piece of data, even talking over an public channel) + allow people to set up secure connections remotely, + without having to meet in person to agree on a key. + +However, there is actually a lot more cryptography that has been implemented in academic and other smaller circles, +such as #cite("https://w.wiki/9fXW", "group signature schemes") +(more advanced versions of digital signatures supporting multiple participants) +and commitment schemes (general methods to commit to some secret that is to be revealed later in a way that prevents cheating). + +Even beyond this, there is cryptography that has been theoretically constructed +but barely (or never) tried in practice, often with a ambitious sense of scale. +Its spirit can be summarized as: #quote[ We want cryptography that can @@ -28,15 +37,15 @@ do any computation so long as someone writes code for it. #remark[ The quote on the title page -("I have a message $M$ such that $op("sha")(M) = "0x91af3ac..."$") +("I have a message $M$ such that $sha(M) = "0x91af3ac..."$") is a concrete example. -The hash function SHA is a particular set of arbitrary instructions, +The hash function sha is a particular set of arbitrary instructions, yet programmable cryptography promises that such a proof can be made -using a general compiler rather than inventing an algorithm specific to SHA256. +using a general compiler rather than inventing an algorithm specific to SHA-256. ] This led 0xPARC to coin the term _programmable cryptography_ to differentiate -this "second generation" technology from "classical" cryptography that solve +this "second-generation" technology from "classical" cryptography that solve specific problems and/or involve specific functions. == Ideas in programmable cryptography @@ -76,7 +85,7 @@ statements of the form: #quote[ I know $X$ such that $F(X, Y) = Z$, where $Y,Z$ are public. ] -once the statement is encoded as a system of equations. One such statement would be "I know $M$ such that $op("SHA256") (M) = Y$." +once the statement is encoded as a system of equations. One such statement would be "I know $M$ such that $sha(M) = Y$." SNARKS are an active area of research, and many different SNARKs are known. Our work focuses on a particular example, PLONK (@plonk). @@ -88,19 +97,24 @@ language. While many services today will do this, even for free, we can also imagine that you care about security a lot and you really don't want the translating service to know anything about your text at all (e.g. selling the text to someone else, adding your text to large language models that can then -be reverse-engineered to find your private information, blackmail you...). +be reverse-engineered to find your private information, blackmailing you...). In _fully homomorphic encryption (FHE)_, one person encrypts some data $x$, and then a second person can perform arbitrary operations on the encrypted data $x$ without being able to read $x$. -With this technology, you have a solution to your problem! (and also much more, -such as a dating service who does not even know the names of people it provides -matchmaking to) You simply encrypt your text $Enc(x)$ and send it to your FHE machine translation server. The server will faithfully translate it into +With this technology, you have a solution to your problem! +You simply encrypt your text $Enc(x)$ and send it to your FHE machine translation server. +The server will faithfully translate it into another language and give you $Enc(y)$, where $y$ is the translation of $x$. You can then decrypt and obtain $y$, knowing that the server cannot extract anything meaningful from $Enc(x)$ without your secret key. +(You could imagine many more applications of FHE, +such as a dating service that does not even know the names of people it +provides +matchmaking to.) + == From One Door to the Next Programmable cryptography has both a surprisingly high amount of theory but @@ -114,14 +128,6 @@ At least for the protocols we mention, they can be implemented, but usually at a cost of doing the computation directly). Can we bring that number down? What other cryptographic systems can be build on top of this technology? -In the Labyrinth of Cryptography, behind us are a series of doors and rooms -that housed great Ideas in first-generation cryptography; we have -explored, exploited, and mastered these Ideas for -many decades. After a specific door, however, the rooms in the Labyrinth -suddenly now house Ideas at a much bigger scale, as if we stepped into a -completely different biome. In front of us, intrepid explorers have actually gone even further, into rooms that house even bigger behemoths of Ideas, such -as witness encryption (WE) and indistinguishability obfuscation (IO). - It is easy to be carried away by the staggering possibilities and imagine a perfect "post-cryptographic" world where everyone has control over all their data and everyone's security preferences are completely fulfilled. It is also diff --git a/easy/src/kzg.typ b/easy/src/kzg.typ index 837a15e..7a55ee2 100644 --- a/easy/src/kzg.typ +++ b/easy/src/kzg.typ @@ -42,7 +42,8 @@ Then anyone in the world can use the resulting sequence for KZG commitments. this is a case of the discrete logarithm problem. You can make the protocol somewhat more secure by involving several different trusted parties. - The first party chooses a random $s_1$, computes $[s_1^0], ..., [s_1^M]$, and then discards s_1. + The first party chooses a random $s_1$, computes $[s_1^0], ..., [s_1^M]$, + and then discards $s_1$. The second party chooses $s_2$ and computes $[(s_1 s_2)^0], ..., [(s_1 s_2)^M]$. And so forth. @@ -167,12 +168,25 @@ To be fully explicit, here is the algorithm: Peggy can establish the value of $F$ at any point in $FF_q$. Peggy wants to convince Victor that $F$ vanishes on a given finite set $S subset.eq FF_q$. - 1. Both parties compute the polynomial + 1. If she has not already done so, Peggy sends to Victor + a commitment $Com(F)$ to $F$.#footnote[ + In fact, it is enough for Peggy to have some way + to prove to Victor the values of $F$. + + So for example, if $F$ is a product of two polynomials + $F = F_1 F_2$, + and Peggy has already sent commitments to $F_1$ and $F_2$, + then there is no need for Peggy to commit to $F$. + + Instead, in Step 5 below, Peggy opens $Com(F_1)$ and $Com(F_2)$ at $lambda$, + and that proves to Victor the value of $F(lambda) = F_1 (lambda) F_2 (lambda)$. + ] + 2. Both parties compute the polynomial $ Z(X) := product_(z in S) (X-z) in FF_q [X]. $ - 2. Peggy does polynomial long division to compute $H(X) = F(X) / Z(X)$. - 3. Peggy sends $Com(H)$. - 4. Victor picks a random challenge $lambda in FF_q$ + 3. Peggy does polynomial long division to compute $H(X) = F(X) / Z(X)$. + 4. Peggy sends $Com(H)$. + 5. Victor picks a random challenge $lambda in FF_q$ and asks Peggy to open $Com(H)$ at $lambda$, as well as the value of $F$ at $lambda$. - 5. Victor verifies $F(lambda) = Z(lambda) H(lambda)$. + 6. Victor verifies $F(lambda) = Z(lambda) H(lambda)$. ] diff --git a/easy/src/mpc.typ b/easy/src/mpc.typ index 0249b69..6b85fdf 100644 --- a/easy/src/mpc.typ +++ b/easy/src/mpc.typ @@ -12,7 +12,7 @@ what could be learned by knowing both $a$ and $f (a , b)$), and likewise for Bob. Yao’s Garbled Circuits is one of the most well-known 2PC protocols -(Vitalik has a great explanation on his +(Vitalik Buterin has a great explanation on his #cite("https://vitalik.eth.limo/general/2020/03/21/garbled.html")[blog];). The protocol is quite clever, and optimized variants of the protocol are being @@ -98,7 +98,7 @@ what you think of when you think of plain-vanilla encryption: You use a secret key $K$ to encrypt a message $m$, and then you use the same secret key $K$ to decrypt it.] -encryption scheme#footnote[We'll talk later about what sort of encryption scheme is suitable for this...] +encryption scheme $Enc$ and publish the following table: #table( @@ -169,10 +169,10 @@ We'll need to make two changes to the protocol. so the outputs will be (the passwords encoding) 0, 0, 0, 1. #table( columns: 2, - [$sha(P_0^(text("left")), P_0^(text("right")))$], [$Enc_(P_0^(text("left")), P_0^(text("right"))) (P_0^(text("out")))$], - [$sha(P_0^(text("left")), P_1^(text("right")))$], [$Enc_(P_0^(text("left")), P_1^(text("right"))) (P_0^(text("out")))$], - [$sha(P_1^(text("left")), P_0^(text("right")))$], [$Enc_(P_1^(text("left")), P_0^(text("right"))) (P_0^(text("out")))$], - [$sha(P_1^(text("left")), P_1^(text("right")))$], [$Enc_(P_1^(text("left")), P_1^(text("right"))) (P_1^(text("out")))$], + [$hash(P_0^(text("left")), P_0^(text("right")))$], [$Enc_(P_0^(text("left")), P_0^(text("right"))) (P_0^(text("out")))$], + [$hash(P_0^(text("left")), P_1^(text("right")))$], [$Enc_(P_0^(text("left")), P_1^(text("right"))) (P_0^(text("out")))$], + [$hash(P_1^(text("left")), P_0^(text("right")))$], [$Enc_(P_1^(text("left")), P_0^(text("right"))) (P_0^(text("out")))$], + [$hash(P_1^(text("left")), P_1^(text("right")))$], [$Enc_(P_1^(text("left")), P_1^(text("right"))) (P_1^(text("out")))$], ) == How Bob uses one gate @@ -189,11 +189,11 @@ Let's play through one round of Bob's gate-using protocol. 2. Bob takes the two passwords, concatenates them, and computes a hash. Now Bob has $ - sha(P_0^(text("left")), P_1^(text("right"))). + hash(P_0^(text("left")), P_1^(text("right"))). $ 3. Bob finds the row of the table indexed by - $sha(P_0^(text("left")), P_1^(text("right")))$, + $hash(P_0^(text("left")), P_1^(text("right")))$, and he uses it to look up $ Enc_(P_0^(text("left")), P_1^(text("right"))) (P_0^(text("out"))). @@ -204,7 +204,7 @@ Let's play through one round of Bob's gate-using protocol. to decrypt $P_0^(text("out")).$ -5. Now Bob has the password for the bit 0, to feed into the next gate -- +5. Now Bob has the password for the bit 0 to feed into the next gate -- but he doesn't know his bit is 0. So Bob is exactly where he started: diff --git a/easy/src/ot.typ b/easy/src/ot.typ index 9c243a3..cc12e49 100644 --- a/easy/src/ot.typ +++ b/easy/src/ot.typ @@ -61,7 +61,7 @@ because he doesn't know the keys. No problem! Bob just picks out the $i$-th ciphertext $Enc_a (x_i)$, adds his own layer of encryption onto it, -and sends the resulting doubly-encoded message back to Alice: +and sends the resulting doubly-encrypted message back to Alice: $ Enc_b (Enc_a (x_i)). $ diff --git a/easy/src/plonk.typ b/easy/src/plonk.typ index 0fbca61..0e109c7 100644 --- a/easy/src/plonk.typ +++ b/easy/src/plonk.typ @@ -16,7 +16,7 @@ For PLONK (and Groth16 in the next section), the choice that's used is: *systems of quadratic equations over $FF_q$*. In other words, PLONK is going to give us the ability to prove -that we have solutions to a system of a system of quadratic equations. +that we have solutions to a system of quadratic equations. #situation[ Suppose we have a system of $m$ equations in $k$ variables $x_1, dots, x_k$: $ @@ -27,20 +27,33 @@ that we have solutions to a system of a system of quadratic equations. Of these $k$ variables, the first $ell$ ($x_1, dots, x_ell$) have publicly known, fixed values; - the remaining $ell - k$ are unknown. + the remaining $k - ell$ are unknown. PLONK will let Peggy prove to Victor the following claim: - I know $ell - k$ values $x_(ell+1), dots, x_k$ such that - (when you combine them with the $k$ public fixed values - $x_1, dots, x_k$) - the $ell$ values $x_1, dots, x_k$ satisfy all $m$ quadratic equations. + I know $k - ell$ values $x_(ell+1), dots, x_k$ such that + (when you combine them with the $ell$ public fixed values + $x_1, dots, x_ell$) + the $k$ values $x_1, dots, x_k$ satisfy all $m$ quadratic equations. ] -This leads to the natural question of how a function like SHA256 can be encoded +This leads to the natural question of how a function like +the hash function SHA-256 can be encoded into a system of quadratic equations. Well, quadratic equations over $FF_q$, -viewed as an NP-problem called Quad-SAT, is pretty clearly NP-complete, -as the following example shows: +viewed as an NP-problem called Quad-SAT, is NP-complete, +as the following example shows. + +(If you're not familiar with NP-completeness, +the point of the example is to show that +"any problem" can be converted to a system of quadratic equations, +so that solutions to the problem give you +solutions to the system of equations. + +In the example, the "any problem" will be encoded +as a Boolean algebra problem called 3-SAT: +a system of constraints, like "(a_1 AND a_2) OR NOT a_3," +in variables $a_1, a_2, dots$ that can take the Boolean values +TRUE or FALSE.) #remark([Quad-SAT is pretty obviously NP-complete])[ If you can't see right away that Quad-SAT is NP-complete, @@ -70,13 +83,13 @@ which gives a high-level language that compiles a function like SHA-256 into a system of equations over $FF_q$ that can be used in practice. Systems like this are called _arithmetic circuits_, and Circom is appropriately short for "circuit compiler". -If you're curious, you can see how SHA256 is implemented in Circom on +If you're curious, you can see how SHA-256 is implemented in Circom on #cite("https://github.com/iden3/circomlib/blob/master/circuits/sha256/sha256.circom", "GitHub"). So, the first step in proving a claim like "I have a message $M$ such that - $op("sha")(M) = "0xa91af3ac..."$" + $sha(M) = "0xa91af3ac..."$" is to translate the claim into a system of quadratic equations. This process is called "arithmetization." @@ -131,7 +144,7 @@ systems of quadratic equations of a very particular form: we get an "addition" gate $a_i + b_i = c_i,$ while if we set - $ ( q_(L,i), q_(R,i), q_(O,i), q_(M,i), q_(C,i)) = ( 1, 1, 0, -1, 0 ), $ + $ ( q_(L,i), q_(R,i), q_(O,i), q_(M,i), q_(C,i)) = ( 0, 0, -1, 1, 0 ), $ we get a "multiplication" gate $a_i b_i = c_i.$ Finally, if $q$ is any constant, then diff --git a/easy/src/preamble.typ b/easy/src/preamble.typ index c6f594c..51dca80 100644 --- a/easy/src/preamble.typ +++ b/easy/src/preamble.typ @@ -1,5 +1,6 @@ #let pair = math.op("pair") -#let sha = math.op("hash") +#let sha = math.op("sha") +#let hash = math.op("hash") #let msg = math.sans("msg") #let Com = math.op("Com") #let Flatten = math.op("Flatten") @@ -66,16 +67,6 @@ // we will see no blue text but will see subscriipt instead) // and false if we are just doing a pdf #let print_flag = true - -#let cite(linktext, text) = { - if (print_flag == true) { - text - footnote(linktext) - } else { - link(linktext, text) - } -} - #let green(body) = block( fill: rgb("#aaeed9"), inset: 8pt, @@ -93,6 +84,16 @@ #let url(s) = { link(s, text(font:fonts.mono, s)) } + +#let cite(target_url, plaintext) = { + if (print_flag == true) { + plaintext + footnote(url(target_url)) + } else { + link(target_url, text(font:fonts.mono, plaintext)) + } +} + #let pmod(x) = $space (mod #x)$ // Main entry point to use in a global show rule