# Cryptography / Criptografía

Cryptography is the discipline that studies the transmission of information according to three aspects:

• Confidentiality. The information should not leak to any unexpected party.
• Integrity. The information must be protected against any malicious modification. (Example: integrity of communications, integrity of backups, etc.)
• Authentication. The information should make clear who the author of it is. (Example: signature authentication, access control, etc.)
 Think of everyday situations where those aspects are important. Piensa en situaciones en que esos aspectos sean importantes.

Some applications:

• Secure two-party computation: several people want to compute something using information each one has, but without revealing that information. For example, two millionaires want to know who of them is richer without revealing how much money they have.
• Zero-knowledge proof: someone wants to prove to another person that they have some information without revealing it. For example, authentication: I want to show that I am the owner of an email account without logging into it in front of you.
• Secret sharing: A society of $n$ people has important things in a safe. Each member has a key. The safe should be opened when a majority of the society wants.
• Removing the need of a trusted party: "Poker over the phone". Example: Bitcoin.

## Elements of a cryptosystem

We will commonly use the names plaintext (P), ciphertext (C), encryption and decryption algorithms/functions (E/D).

### One-way functions and trapdoor one-way functions

Encryption and decryption are based on functions of a special kind.

• A one-way function is a function where that is relatively easy to evaluate, but whose inverse is very difficult to calculate. Example: it is easy to multiply two big numbers, but it is difficult (it takes a long time) to factor big numbers. This is the basis of the cryptographic system that we will study later, called RSA.
• A trapdoor one-way function is a function whose inverse is difficult to calculate, but with some extra information it becomes easy. Example: it is difficult to factor a product of two primes, but if we know one of them then it is easy to compute the other by a division. That extra information is called the (decryption) key.

Kerckhoffs's principle: a cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

p,q = next_prime(230425464897893453455642343764),next_prime(340545678973454589456568734587) print p,q n = p*q n
time factor(n)
time n/p

## Warm-up: Shift cryptosystems

A shift cryptosystem consists on an encryption procedure where we take each letter of a message and change it by a shift (displacement) of a number of positions. The classical Caesar cipher is based on a shift of distance 3:

Let us explore them briefly with Sage. We create a plain text from a sentence in English.

S = ShiftCryptosystem(AlphabeticStrings()) P = S.encoding("The early bird might get the worm, but the second mouse gets the cheese.") P

We shift every letter; the size of the shift is the key. The ciphertext cannot be understood. With the same key we can decrypt.

K = 7 C = S.enciphering(K, P); C
S.deciphering(K, C)

This system is very weak: there are only 26 possible keys, so we can try to decode with all of them and see if we find a sentence we can understand. This is called a brute-force attack.

S.brute_force(C)

We still needed to look at 26 messages to find the correct one. We can do better: statistical cryptanalysis. In our situation, we can use knowledge about English language: in English, the most common letter is "E". So, if the plaintext is long, it will have more "E"'s than other letters. If we find the most common letter in the ciphertext, we will know that it probably comes from "E". Here are the frequencies in English.

AlphabeticStrings().characteristic_frequency()

In this case, Sage has a statistical method to find which keys are more probable, by comparing frequencies. We see that the first candidate, the most probable key and message, is precisely the correct one. The message has been discovered!

S.brute_force(C, ranking="chisquare")

We see why it is important that the set of possible keys is very large, and that the mathematical and statistical properties of the system and the messages are studied properly to make sure that no attacks can be done successfully.

 Create a message in a non-English language and encrypt it with a shift. See if the brute-force attack of Sage can decrypt it. Crea un mensaje que no esté en inglés y encríptalo con un cifrado por desplazamiento. Después mira si Sage consigue descifrarlo.

## Public key systems

 Think about the following situation: Alice and Bob need to agree on a cryptosystem. How can they do it if someone could be listening? Piensa en la situación siguiente: Alice y Bob quieren ponerse de acuerdo en un criptosistema, pero ¿cómo hacerlo si alguien podría estar escuchando?

In the 70s, two cryptologists called Diffie and Hellman published a method to exchange keys over a public channel. Since then, public key cryptography has become widely used. We will not go into the details here. The concept is:

• Keys have two parts, a public one and a private one.
• The encryption method uses the public key: $E = E_{public}$. Your public key is available to everyone; that means that everyone can encrypt a message with your key.
• The decryption method uses the private key: $D = D_{private}$. Only you can decrypt a message that was encrypted with your public key.
• In practice, this means that anyone can send you a secret message without asking you for anything, and you will be the only person to be able to read it.

## The RSA cryptosystem

Today we are going to learn a popular public key cryptosystem: RSA. It is used in different contexts and it requires little mathematical knowledge to understand it.

As we saw in the ciphers of before, we need to work in a cyclic way. Numerically this can be done using the so called modular arithmetic. We will learn it now. If we number the letters as A=0, B=1, C=2,..., Z=25 then the cipher is $E(x)=x+k$. But the last letters go back to the beginning... mathematically this is achieved by the equivalences $26\equiv0$, $27\equiv1$, ... and, if we go in the other direction, $-1\equiv25$, $-2\equiv24$, ... The equivalences continue indefinitely and they are based on the number 26 (because we make numbers equivalent when we add or subtract 26). We are doing arithmetic modulo 26. In this number system there are essentially 26 different numbers.

 Find three numbers that are equivalent to 11 modulo 26. One of them must be greater than 100 and another one must be negative.  Encuentra tres números equivalentes a 11 módulo 26. Uno debe ser mayor que 100 y el otro debe ser negativo.

In books you will see that the equivalence is written like this: $a \equiv b \pmod{26}$.

Since adding or subtracting 26 repeatedly gives equivalent numbers, we have another way of expressing this: $a \equiv b \pmod{26}$ if $a-b$ is a multiple of 26. We will use this often.

Of course, instead of 26 the same thing can be done for any other natural number.

The numbers modulo $n$ have three operations: +, -, and *. But in principle we don't know if we can divide. When we have these three operations and their usual properties (commutativity...), the technical name is ring (if we can also divide, it is called a field, as we saw in the codes session). Let us calculate with Sage.

R = IntegerModRing(26) R(261); R(3)-R(5); R(10)*R(3); R(2)^100
 Make sure that you understand these results. Asegúrate de entender esos resultados.

The RSA system works superficially as follows: we will choose a number $n$ for doing computations mod $n$ (later we will see how), and also an exponent $e$. Then we will write a message as an integer $m$, and the encryption process will be $E(m)=m^e \pmod n$. The numbers $n$ and $e$ are your public key; anyone can encrypt a message for you with those numbers.

For decryption you will use another number, your private key, $d$. This number must satisfy

$(m^e)^d \equiv m \pmod n.$

We will talk later about choosing $n$, $e$ and $d$, so that it is very difficult to calculate $d$ from the other two. But first, let us briefly talk about exponentiation, since it is an operation we use in both halves of the process.

 To get an idea of the computations, encrypt by hand the message $m=3$ using $n=5$ and $e=12$. Hint: you may think of doing 11 multiplications, but you can do it with less. Another hint: you need not wait until the end to reduce the result mod 5. Para que entiendas cómo son estos cálculos, encripta a mano el mensaje $m=3$ usando $n=5$ y $e=12$. Pista: se puede hacer con menos de 11 multiplicaciones. Otra pista: no hace falta esperar al final para reducir módulo 5.

 One more: encrypt by hand the message $m=3$ using $n=10$ and $e=128$. Hint: $128=2^7$. Conclusion: powers mod $n$ are not so long to compute as it seems. Una más: encripta a mano el mensaje $m=3$ usando $n=10$ y $e=128$. Pista: $128=2^7$. Conclusión: las potencias módulo $n$ no se tardan tanto en calcular como parece.

In short, we can calculate powers with large exponents in a relatively fast way by doing repeated squaring and some multiplications. For example, a usual value in practice for $e$ is $65537=2^{16}+1$. (How many operations do we need for such a power?)

Note that $d$ is something like an inverse of $e$. Next we will talk about inverses in general, and later about our particular situation of inverting exponents.

## Modular inversion

Can we divide numbers modulo $n$? In fact, in order to divide we only need to find inverses: $\frac{a}{b} = b\cdot\frac{1}{a}$.

Unfortunately, it does not always work because some numbers don't have inverses:

R = IntegerModRing(26) 1/R(3)
1/R(4)

 Find by hand the inverse of 5 modulo 26, then check it with Sage. You can instead write a loop to check all the elements from 1 to 25 (we need not check 0, why?) Calcula a mano la inversa de 5 módulo 26 y luego comprueba con Sage. En vez de eso puedes escribir un bucle que compruebe los números de 1 a 25 (no hace falta comprobar el 0, ¿por qué?)

Which numbers have inverses modulo 26? if $a$ has an inverse $b$ then $ab\equiv1\pmod{26}$. We will see next that this equation is very related to a basic mathematical object: the greatest common divisor or gcd.

Recall the definition of gcd with this example:

a,b = 54,90 da = divisors(a); db = divisors(b) da; db
cd = [i for i in da if db.count(i)>0] cd; max(cd)
gcd(a,b)

In order to compute the gcd of two numbers, it seems that we need to factor them. This is difficult in general (we will talk about that later). But there is a much faster method: the Euclidean algorithm. It is based on the following observation:

$\gcd(a,b) = \gcd(a,b-a)$

By repeated application of this, we can reduce the numbers until it is easy to calculate the gcd directly.

 Compute the gcd below with this method. Then evaluate with Sage to check the result. Calcula el gcd siguiente con este método. Después evaluá la celda para comprobar el resultado.
gcd(195,52)

A faster way: if $b$ is much greater than $a$, instead of subtracting many times, $b-a$, $b-2a$, $b-3a$, ..., you can jump to the final value by doing a division with remainder:

$b = a\cdot q + r \qquad\Rightarrow\qquad \gcd(a,b) = \gcd(a,aq+r) = \gcd(a,r).$

We redo the previous example. In Sage, the remainder is calculated with the symbol %. If you want both the quotient and the remainder you can apply the method .quo_rem.

195.quo_rem(52); 195 % 52
52 % 39
39 % 13

So the gcd is 13 (because it divides 39, and it is obviously the greatest divisor of 13).

Now comes the relation between the gcd and the modular inverse. The relation is given by some extra information that we can extract from the Euclidean algorithm. Let us write the divisions that we did:

$195=52\cdot3+39; \quad 52=39\cdot1+13; \quad 39=13\cdot3.$

In the penultimate relation we can write the gcd (the remainder) as a combination of the other two numbers: $13=52-39\cdot1$.

But in the relation before, the reminder (39) can also be isolated, and substituted into the relation that we just wrote:

$39 = 195-52\cdot3 \qquad\Rightarrow\qquad 13 = 52-39\cdot1 = 52-(195-52\cdot3) = 195\cdot(-1) + 52\cdot4.$

In this way, the gcd is a linear integer combination of the two original numbers. This fact, and the method to calculate it, is called extended Euclidean algorithm, because it gives the gcd but also the two coefficients.

 Compute the gcd of 144 and 25 using the previous algorithm. Then write it as a combination of the two numbers. Calcula el gcd de 144 y 25 con el algoritmo anterior. Después escríbelo como combinación de los dos números.

Sage can calculate the gcd and the coefficients at the same time, with the function xgcd. Note that we can assign the three values to three variables at the same time.

xgcd(195,52)
d,a,b = xgcd(195,52) d == a*195+b*52

And finally we will relate all this story of the gcd with modular inverses! With an example.

Before, you calculated the inverse of 5 mod 26. We compute the gcd between these two numbers with the extended Euclidean algorithm:

xgcd(5,26)

This means that $1=5\cdot(-5)+1\cdot26$. But what happens if you write this equality of integers as an equality mod 26?

$1\equiv 5\cdot(-5)+1\cdot26 \equiv 5\cdot(26-5)+1\cdot0 = 5\cdot21 \pmod{26}$

This means that the inverse mod 26 of 5 is 21 (their product is $105\equiv1 \pmod{26}$).

In summary, $a$ has inverse $\pmod n$ when $gcd(a,n)=1$ (we say that $a$ and $n$ are coprime), and its inverse is the number $s$ in the expression $1=as+nt$ computed with the extended Euclidean algorithm.

 Write a program to check which numbers from 1 to 25 have inverses mod 26 and to calculate them when they exist. Escribe un programa que calcule qué números de 1 a 25 tienen inversa módulo 26 y que las calculen.

In general, not all the numbers from 1 to $n$ have an inverse mod $n$ because not all of them are coprime with $n$. Euler discovered a function that counts how many coprime numbers are there for any $n$; it is usually called $\varphi(n)$ (the Greek letter "phi"). For example:

euler_phi(26)

Euler's phi function is very important for us, because it appears in a theorem that talks precisely about the exponents that we care about:

Euler's theorem: if $gcd(a,n)=1$ then $a^{\varphi(n)}\equiv1 \pmod n$.

 Let $n=15$. Calculate how many numbers from $1$ to $n$ have inverses mod $n$. Then calculate $a^{\varphi(n)} \pmod n$ for every $a$ from 1 to $n$ and check that the results agree with the first calculation. Sea $n=15$. Calcula cuántos números de 1 a $n$ tienen inversa mod $n$. Después calcula $a^{\varphi(n)} \pmod n$ para todos los $a$ entre 1 y $n$ y comprueba que coincide con lo calculado antes.

We have almost all the ingredients now.

## Building the keys

We need:

• A modulus $n$.
• An exponent $e$ for encrypting.
• An exponent $d$ for decrypting, which is related to $e$ as follows: $d\cdot e \equiv 1 \pmod{\varphi(n)}$. This means that $de-1 = \varphi(n)\cdot w$ for some integer $w$.

With the last condition we have, for every message $m$:

$(m^e)^d = m^{e\cdot d} = m^{1+w\varphi(n)} = m\cdot (m^{\varphi(n)})^w \equiv m\cdot 1^w = m \pmod n.$

This means that decryption really recovers the plaintext ($m$) from the ciphertext ($m^e$).

In order to build the keys, we would do then:

1. Choose $n$.
2. Compute $\varphi(n)$.
3. Choose $e$ that is coprime with $\varphi(n)$.
4. Compute $d$ as the inverse of $e$ mod $\varphi(n)$.
5. Publish $n$ and $e$, do not publish $d$.
 Construct a set of keys. Then exchange messages with another person. Construye unas llaves. Después intercambia mensajes con otra persona.

 Think about this: if you reveal $\varphi(n)$ then your messages can be discovered. Piensa en esto: si revelas $\varphi(n)$ se pueden descubrir tus mensajes.

If you followed the procedure, it is very likely that your keys are not secure, because there are many mathematical properties that allow someone with some computing power to find your $d$. Let us see how to make it (more) secure.

 Take someone else's public keys. Try to break their system: see if in less than 60 seconds of computations you can find their private key (you can take as long as you need to think and to write code). Toma las llaves públicas de alguien. Intenta romper su sistema: a ver si con menos de 60 segundos de cálculo puedes encontrar su clave privada (puedes tomarte el tiempo que quieras para pensar y escribir código).

The weakness in the system comes from being able to find $\varphi(n)$ from $n$. This depends, in fact, on the factorization of $n$. So we should make $n$ be difficult to factor. Remember that we said that factoring is a difficult function (multiplication is a one-way function).

It is believed that the safest way of choosing $n$ is:

1. Find two big primes $p$ and $q$. They should have similar number of digits (but not the same). The longer they are, the safer.
2. $n=pq$.
3. In this case, it is known that $\varphi(n)=(p-1)(q-1)$. Note that we can compute this very easily if we know the factors of $n$, but it is difficult if we don't (it is a trapdoor function).

Once we have done this, we find some $e$ coprime with $\varphi(n)$ (just choose at random one that is not very small).

 Choose a random number $n$ of 50 digits (use the function randint). See how long it takes to compute $\varphi(n)$ and also factor $n$. Repeat all this five times. Elige al azar un $n$ de 50 cifras (usa la función randint). Calcula cuánto se tarda en calcular $\varphi(n)$ y también en factorizar $n$. Repite todo el procedimiento cinco veces.

s = 20 p = next_prime(randint(10^s, 10^(s+5))) q = next_prime(randint(10^s, 10^(s+5))) p; q
is_prime(p); is_prime(q)
n = p*q; n
# if n is too big the next line will take too long time factor(n)
# if n is too big the second line will take too long time phi = (p-1)*(q-1) time phi = euler_phi(n) phi
e = 0 while gcd(e, phi) != 1: e = ZZ.random_element(phi) print e, gcd(e, phi)
d = inverse_mod(e,phi); d
d*e % phi
m = ZZ.random_element(n); m
ciphertext = power_mod(m, e, n); ciphertext
plaintext = power_mod(ciphertext, d, n); plaintext; plaintext == m