TTM4135: Applied Cryptography and Network Security
Preface
This compendium will attempt to explain the core concepts of the subject. Feel free to edit, add or remove content if you feel like it would enhance the content.
Note: This compendium has only been written by a few people who are/were fellow students like yourself, and there are bound to be errors and missing content. Please be critical when using this compendium and please make your own contributions to improve the content.
Introduction
This section serves as an introduction to what the course is about, but some important concepts are introduced.
Definitions of fundamental concepts
Let's start of with defining what this is all about.
- Information security
- Minimizing vulnerabilities of information assets.
From this we can define several other concepts.
- Vulnerability
- Any weakness that could be exploited to violate a system or the information it contains.
- Information assets
- Can include data, software and hardware, people and even buildings.
- Threat
- A threat is a potential violation of security.
All of these are so fundamental that you will most likely not be asked about this on the exam, this is just to provide some clarity when explaing more advanced topics.
The CIA triad
Traditional definitions of information security are based on three information security goals:
- Confidentiality
- Preserving authorised disclosure of information.
- Integrity
- Preventing unauthorised (accidental or deliberate) modification or destruction of information.
- Availability
- Ensuring resources are accessible when required by an authroised user.
Passive and Active threats
When concidering threats, there are two main types: passive and active.
- Passive threats
- Passive threats do not alter information in the system. Such attacks may be eavesdropping and traffic analysis.
- Active threats
- Active threats alter information in the system. Such attacks may be masquerade, replay, modification of messages, denial of service.
Security services and mechanisms
- Security service
- A processing or communication service to give a specific kind of protection to system resources.
- Security mechanism
- A method of implementing one or more security services.
Number theory
Cryptography is often done in the digital domain. Mostly the mathematics is discrete mathematics because cryptology deals with finite objects such as alphabets and blocks of characters. Therefore we need to look at modular arithmetic which only deals with a finite number of values.
This section is heavy to read, but nice to know as it is the framework we will use to compute and analyze the different cryptographic schemes.
Factorization
$Z$ denotes the set of all integers,$Z^+$ denotes all positive integers.$a|b$ , meaning$a$ divides$b$ , is true of there exists a$k$ such that$ak = b$ - If
$a|b$ and$a|c$ , then$a | (b+c)$ - If
$p$ is prime and$p|ab$ , then$p|a$ or$p|b$
- If
- any
$p \in Z^+$ ($p$ in$Z^+$ ) is a prime number if only$x|p\Rightarrow x\in \{1,p\}$
Euclidian division
(In some circles known as floor division.)
Euclidian division can be used to find the modular inverse of
Greatest common divisor (GCD)
$d|a$ and$d|b$ ($d$ divides$a$ and$b$ )- if
$c$ divides$a$ and$b$ then$c$ divides$d$ $d$ > 0
(relatively prime, mutually prime, coprime and co-prime are equivalent terms)
The GCD can be computed using the Euclidian algorithm.
Back substitution (extended Euclidian algorithm)
By using back substitution in the euclidian algorithm, we can find integers
The case where
Modulo and Modulus
Example:
mod
is no longer an operator, it rather specifies the modulus for
- Simplified
$a\equiv b\mod{c} \ \leftrightarrow\ a\ \text{mod} \ c=b\ \text{mod} \ c$ - Formally
$a\equiv b\mod{c} \ \leftrightarrow \ a-b=kc$ , where$k\in Z$
Modular arithmetic
Given
$a+c\equiv b+d \mod n$ $ac\equiv bd\mod n$ $ka\equiv kb\mod n$
This means we can always reduce the inputs with modulo
Modular inverse
The inverse of
In cryptosystems we often need to find inverses so that we can decrypt, or undo, certain modular operations. A theorem says that
The inverse can efficiently be found using the extended Euclidian algorithm to solve
Residue class
The set
Groups
A group is a set,
- Closure
$a\cdot b\in G$ for all$a,b\in G$ - Identity
- There exist an element in G,
$1$ , so that$a\cdot 1=1\cdot a=a$ for all$a\in G$ - Inverse
- For all
$a\in G$ there exists an element,$b$ , such that$a\cdot b=1$ for all$a\in G$ - Associative
- For all
$a,b,c\in G$ the following is true:$(a\cdot b)\cdot c=a\cdot (b \cdot c)$
We will be using commutative (or abelian) groups, which also satisfy:
- Commutative:
- For all
$a,b\in G$ the following is true:$a\cdot b=b\cdot a$
We often write
Example:
Cyclic groups
Cyclic groups are made possible when the group operator can produce cycles instead of moving towards infinity. This is the case for modular arithmetic.
- The order of a group
$G$ , denoted$|G|$ , is the number of elements in$G$ - The order of an element
$g\in G$ , often written$|g|$ , is equal to the smallest integer$k\in Z^+$ which makes$g^k = 1$ - A element
$g\in G$ is a generator for$G$ if$|g|=|G|$ - A group is cyclic if it has a generator.
Cyclic groups are important in cryptography since we can store and represent enormous groups with only the generator and the group operator. RSA uses this
$Z^{*}_p$
A complete set of residues modulo any prime
Some useful properties:
- The order
$|Z^{*}_p|$ is$p-1$ $Z^{*}_p$ is cyclic$Z^{*}_p$ has many generators in general, and are easy to find.
$Z^{*}_n$
For any
$Z^{*}_n$ is a group, but no cyclic in general- Finding the order of
$Z^{*}_n$ is difficult in general
Fields
A field is a set,
The following conditions must be satisfied:
$F$ is a commutative group under the + operation, where the identity is 0$F\ \backslash \{0\}$ (denotes set subtraction) is a commutative group under the$\cdot$ operation$F$ is distributive: For all$a,b,c\in F$ , we have that$a\cdot(b+c)=(a\cdot b)+(a\cdot c)$
Finite fields
For secure communications we are usually only interested in fields with a finite number of elements
-
A famous theorem says that finite fields exist of size
$p^n$ for any prime$p$ and positive integer$n$ , and that no finite field exists of other sizes -
The most interesting cases for us are fields of size
$p$ for a prime$p$ and fields of size$2^n$ for some integer$n$
$Z_p$
Can also be written as
- Multiplication and addition are done in modulo
$p$ - Multiplicative group is exactly
$Z^{*}_p$ - Public key encryption and digital signature schemes use
$Z_p$
$Z_2$
Can also be written as
$Z_2$ is the simplest field, containing only two elements:$0$ and$1$ - The + operator is
$\oplus$ (logical XOR) - The only non-zero element is
$1$ , making the multiplicative group trivial:$\{1\}$
$Z_{2^n}$
Can also be written as
Arithmetic in these fields can be considered as polynomial arithmetic where the field elements are polynomials with binary coefficients
- Any n-bit string can be equated tp a polynomial in a natural way.
Example:
$00101101 \leftrightarrow x^5 + x^3 + x^2 + 1$ - The field can be represented in different ways by use of a primitive polynomial
$m(x)$ - Addition and multiplication is defined by polynomial addition and multiplication modulo
$m(x)$ - Polynomial division can be done very efficiently in hardware using shift registers
$Z_{2^8}$
Can also be written as
- The generator polynomial ion AES is chosen as
$m(x)=x^8+x^4+x^3+x+1$ , or 283. - To multiply two strings: Mutliply them as polynomials and take their remainder after dividing by
$m(x)$
Classical Encryption
Glossary:
- Plaintext
- The raw data to be encrypted. Often denoted
$p$ or$m$ (short for message) - Ciphertext
- The encrypted data. Can be decryped back to the plaintext. Often denoted
$c$ - Key
- The secret blob of data required to perform the encryption/decryption. Often denoted
$k$ - Cipher / encryption scheme
- The scheme used for encryption and decryption. Often requires a key.
- Encryption
- The function which takes in a plaintext and produces the corresponding ciphertext. Often denoted
$F_k(p)$ - Decryption
- The inverse of the encryption function, taking in the ciphertext and producing the plaintext. Often denoted
$F^{-1}_k(c)$
Exhaustive key search
The most basic method of attack is exhaustive key search, sometimes called a brute-force attack, in which the adversary tries all possible keys. This attack cannot be prevented so all cryptosystems must have enough keys to make exhaustive search too difficult computationally. 128 bits is the smallest key size which would be acceptable to prevent exhaustive key search today.
Attack classificatons
A cryptosystem which can be practically attacked using only ciphertext, is generally considered highly insecure. The modern standard is that a cryptosystem should be secure against chosen plaintext and chosen ciphertext attacks:
- Ciphertext Only attack
- The attacker has available only the intercepted ciphertext.
- Known Plaintext attack
- The attacker knows a small amount of plaintext and its ciphertext equivalent.
- Chosen Plaintext attack
- The attacker can obtain the ciphertext equivalent of some plaintext which can be selected by the attacker; i.e. the attacker has an “inside encryptor” available.
- Chosen Ciphertext attack
- The attacker can obtain the plaintext equivalent of some ciphertext which can be selected by the attacker; i.e. the attacker has an “inside decryptor” available.
History shows that chosen ciphertext attacks can often be practical to set up for an attacker.
Kerckhoffs' Principle
- Kerckhoffs’ principle
- The attacker has complete knowledge of the cipher (i.e the decryption key is the only thing unknown to the attacker)
This one is asked about a lot. Make sure you know this one.
History has shown that Kerckhoff’s Principle is a reasonable assumption. Using a secret, non-standard algorithm can cause severe problems. This would be an example of security through obscurity.
Statistics of Natural Language
Many of the classical ciphers can be broken by studying the statistics of the messages. If you have a reasonably long message written in English, it is very likely that the most used character will be E, N, T, I and so on. By studying the character frequency of an encrpyted message you can often try to guess which character is supposed to be which (it will of course largely depend on the cipher used, and may not work at all on some ciphers.
Transposition ciphers
These types of ciphers are mosly based on switching the positions of the letters that appear in the message. For a transposition cipher the distributions of plaintext and cihpertext are the same.
If the block size is 10 characters there are
Substitution ciphers
These types of ciphers are mostly based on switching out individual characters by other characters. For example switching E with Z or T with C. You can figure out if a simple substitution cipher has been used if the character frequency looks similar to the standard english character frequency only that other letters are used.
A simple substitution cipher has
Vigenère cipher
If a plaintext comes from a natural language, such as English, the Vigenère cipher can be expected to have a "flat" frequency distribution of characters. Cryptanalysis of the Vigenère cipher often uses autocorrelation in order to identify the period (key length).
Has
Hill Cipher
The Hill cipher is a historical cipher with the encryption equation
Block Ciphers
- Product cipher
- A product cipher is a cryptosystem in which the encryption function is formed by applying (or composing) several sub-encryption functions.
- Iterated cipher
- An encryption process that is divided into
$r$ similar rounds, the subencryption functions are all the same function,$g$ , called the round function. Each key is dervied from the overall master key$K$ . The keys are called round keys or subkeys and are derived from$K$ using a process called the key schedule.
Most modern ciphers are in a special class of product ciphers known as iterated ciphers. There are two types of iterated cihpers:
- Substitution-permutation network
- An iterated cipher. The block length must allow each block to be split into sub-blocks. Substitution and permutation is then performed in a certain fashion.
- Feistel cipher
- An iterated cipher in which the round function swaps the two halves of the block and forms a new right hand half.
Data encryption standard (DES)
- DES is a 16-round Feistel chiper with key length of 56 bits and data block length of 64 bits.
- DES is no longer concidered secure.
Triple-DES
To increase the security of DES, the algorithm may be run multiple times. Two times would be the simplest, but is vulnerable to a meet-in-the-middle attack.
To counter this, three runs are needed. This is often implemented as Encrypt-Decrypt-Encrypt. This allows backwards compatibility with normal DES, by using the same key for all three steps:
While 3DES takes three keys as parameters, using it with only two keys is enough to stop the meet-in-the-middle attack, and often good enough:
Some applications, like PGP and S/MIME, still use three keys with 3DES.
AES
- Symmetric key block cipher
- 128-bit data block; 128-, 192- or 256-bit master key
- Number of round,
$\mathit{NR}$ , is 10, 12 or 14 (for 128-, 192-, 256-bit keys) - Byte-based design
- Structure is essentially a substitution-permutation network
Consists of four stages:
- Substitute bytes
- Uses an S-box to perform a byte-by-byte substitution of the block
- ShiftRows
- A simple permutation
- MixColumns
- A substitution that makes use of arithmetic over
$GF(2^8)$ - AddRoundKey
- A simple bitwise XOR of the current block with a portion of the expanded key
Only the AddRoundKey
stage makes use of the key. The other three stages provides confusion, diffusion and non-linearity, but no security in themselves.
When decrypting with AES, the inverses of the three first stages are used. The AddRoundKey
stage is the same, because
Modes of operation
Different modes of operation has been introduced to balance randomness and ability to encrypt blocks in parallel
Electronic Codebook (ECB) Mode
- Randomised: No
- Padding: Required
- Error propagation: Errors propagate within blocks
- IV: None
- Parallel encryption: Yes
- Parallel decryption: Yes
Cipher Block Chaining (CBC) Mode
- Randomised: Yes
- Padding: Required
- Error propagation: Errors propagate within blocks and into specific bits of next block
- IV: Must be random
- Parallel encryption: No
- Parallel decryption: Yes
Counter (CTR) Mode
- Randomised: Yes
- Padding: Not required
- Error propagation: Errors occur in specific bits of current block
- IV: Nounce must be unique
- Parallel encryption: Yes
- Parallel decryption: Yes
CMC-MAC
A block cipher can be used to create a MAC (Message authentication code, fancy-talk for signature), providing message integrity (not confidentiality). CBC-MAC uses the CBC mode of operation to produce a MAC:
- Start with a public
$IV$ - Do:
$C_t = E_K(P_t \oplus C_{t-1})$ , where$C_0=IV$ - Final
$C_n$ is the MAC
CBC-MAC is unforgeable aslong as the message length is fixed.
CMAC (CBC-based MAC, called Cipher-based MAC)
CMAC is a One-key MAC. It may be used to provide assurance of the authenticity and, hence, the integrity of data. The core of the CMAC algorithm is a variation of CBC-MAC, but with more keys derived from the original key which are XORed to the final result. The final MAC is a cutout of the last computed block.
Length of (C)MAC
NIST allows you to define yourself how secure the MAC should be. It defines:
$I$ - limit on invalid messages detected before changing the key$R$ - acceptable probability that a false message is accepted
Then, the MAC tag should be at least
CCM mode (Counter mode with CBC MAC)
Distinguishes the data sent between two categories:
- Payload
- Must be both authenticated and encrypted
- Associated data (metadata)
- Needs only be authenticated
CCM mode combines CBC-MAC for authentication of all data (payload and associated data) and CTR encryption for the payload.
Authenticated encryption.
Galois/Counter Mode (GCM)
Like CCM, encrypts the payload with CTR mode, but adds integrity in a different way: Galois mode.
CCM mode is not suitable for processing of streaming data. The formatting function requires knowledge of the length of the plaintext. Galois counter mode overcomes this limitation.
Side channels
Due to AES being suceptible to side-channel attacks, older devices without dedicated hardware acceleration are forced to use a slower implementation to perform AES. These sidechannels can derive the key being used. Newer schemes like ChaCha20 solves issues like this.
Psueodorandom numbers and stream ciphers
Randomness
- TRNG
- A true random number generator is a physcial process which outputs each valid string indpenedently with equal probability.
- PRNG
- A psuedo random number generator (PRNG) is a deterministic algorithm which approximates a TRNG.
- TRNGs can be constructed from physical devices and used as seeds for PRNGs
- PRNGs can be constructed from other primitives including block ciphers.
- TRNGs cna be used to make unbreakable encryption via the one time pad.
- PRNGs can be used as practical synchronous stream ciphers.
There is a distictions between statistical PRNGs and cryptographical PRNGs: The former is validated to have a uniform distribution, the latter is also non-predictable and should not leak its seed/state.
More formally, this is defined by two criterias:
- Uniform distribution
- The frequency of occurence of ones and zeros should be approximately equal.
- Independence
- It should not be possible to infer a subsequence from any other.
Statistical PRNGs does not neccesarily cover the last criteria
One-time pad
A stream cipher that provides perfect secrecy (as defined by Shannon). The key is a random sequence of characters, all of them independantly generated. Each character in the key is used one time only. A non periodic binary synchronous stream cipher.
Encryption and decryption could look like
Stream ciphers in general
Expands the idea of one-time pad, but instead of using a insanely long keys, use a cryptographic PRNG where the key is the seed. AES in CTR mode is a stream cipher. Stream ciphers are symmetric in general.
When the key is generated independently of the plaintext, then it is called a Syncronous stream cipher
Linear Feedback Shift Registers (LFSR)
The A5 cipher
- The A5 is a binary synchronous stream cipher applied today in most GSM mobile telephones.
- The A5/1 algorithm uses three linear feedback shift registers (LFSRs) whose output is combined.
- The three LFSRs are irregularly clocked which means that the overall output is non-linear. The effective key length is 54 bits.
RC4
- A word-based stream cipher.
- Simple and efficient for software implementation
- Too weak to be used today.
ChaCha
- Defined together with a MAC called Poly1305
- Uses keys 256-bit in size
- Faster than AES in software.
- Is a add-rotate-xor (ARX) cipher
- Uses
$\oplus$ , addition in mod$2^{32}$ , and rotation operations - Does this in 20 rounds
- Produces 512 bits of keystream
- Uses
Public key cryptography (asymmetric cryptography)
- Uses a pair of different keys: One public key for encryption, one private key for decryption
- Uses a one-way function as its basis of security: a function that is easy to perform one way, but hard to reverse
- It must be almost impossible to use the public key to learn the private key (involves cracking the the one way function)
Public key cryptography has two main advantages compared to shared-key (symmetric) cryptography:
- The key management is simplified: keys do not need to be transported confidentially, you only need to keep the private key safe.
- Digital signatures can be obtained.
Number theory for public key cryptography
Chinese Remainder Theorem
Allows you find
, given that
Is used to significantly speed up RSA decryption, it is also used to prove the correctness of the RSA cipher.
Euler function ($\phi$ )
For a positive integer
Properties:
$\phi(p)=p-1$ , where$p$ is prime$\phi(pq)=(p-1)(q-1)$ , where$p$ and$q$ are distinct primes- Given
$n=p_1^{e_1}\dots\ p_t^{e_t}$ , we have:$\prod_{i=1}^t p_i^{e_i-1}(p_i-1)$
Testing for primality
There exist deterministic polynomial-time primality tests. Although a huge theoretical achievemtn, we use probabilistic methods in practice:
Fermat Test
can test if a number is a prime number:
- If a number satisfies Fermat's equation then we assume that it is prime.
- The Fermat test can fail but it is very unlikely to in practice.
-
The fermat test's failure can be reduced by repeating the test with different base values.
-
If the test concludes the number is a composite, then it is definitely so
- The test can conclude probable prime even if the number is a composite (psuedoprime).
- There are some composite numbers for which the test will always output probable prime (Carmichael numbers).
Miller-Rabin
test works on the same idea as the Fermat test:
- Can be guaranteed to detect composites, just like Fermat
- Can conclude probable prime otherwise
- Most widely used test for generating large prime numbers.
Generating primes:
Given a good primality checker, finding one is trivial: Generate a random odd number and test every odd number around it untill you hit a prime.
The Square-and-Multiply method
This is a method to greatly reduce the computations need to perform exponentiation:
This can be reduced to a few squaring and multiplication operations within the bounds of the modulus
Example:
Then multiply them:
(3 multiplications)
From this we see that the number of squarings operations are decided by the topmost bit in
Two hard problems:
As discussed, public-key cryptography relies on a one-way function, or in other words, a difficult problem.
A function is concidered one-way if
- Integer factorisation
- Given an integer (of
$m$ bits), find its prime factors. - The Discrete logarithm problem (with base 2)
- Given a prime
$p$ (of$m$ bits) and an integer$y$ where$0<y<p$ , find an$x$ such that$y\equiv 2^x\mod p$ - More formally: Given
$y$ in cyclic group$G$ with generator$g$ , find$x$ such that$y=g^x$
- There are no known polynomial-time algorithms to solve either of these two problems.
- They really get hard when increasing the bitsize of the numbers (
$m$ ) - The best known algorithms are sub-exponential: they are slower than any polynomial algorithm but faster than any exponential function.
- These problems can be "solved" if the numbers used has specific properties, making secure number generation a tricky subject. These issues are known as trapdoors
The inverse of these two problems are multiplication and exponentiation respectively, which is why the problems are easy to construct but hard to solve.
RSA
RSA is an example of a public-key cryptosystem and digital signature scheme. Asymmetric cryptography provides some features which cannot be achieved with symmetric key cryptography.
- RSA is based on the integer factorisation problem.
- Implementation:
- Key generation (choice of
$e$ , generating large primes) - Encryption and decryption algorithms (fast exponentiation, using CRT for decryption)
- Formatting data (padding)
- Key generation (choice of
- Security:
- Standardised padding should always be used before encryption
- Factorisation of the modulus is the best known attack against RSA in the case that standardised padding is used.
- Finding the private key from the public key is as hard as factorising the modulus.
- It is an open problem whether there is any way of breaking RSA encryption without factorising the modulus.
Sidechannel attacks
These attacks attempt to leak the secret key by using additional measurable or inferrable information:
- Timing attacks
- Timing of private key operations provides information about the private key
- Power analysis
- Power usage profile of private key operations provides information about the private key
- Fault analysis
- Measures the effect of interfering with the private key operations to obtain information about the private key
- Insufficent randomness
- Poor random number generation makes the one-way problem easier to bypass.
In 2008 it was discovered that OpenSSL on Debian-based systems had massively reduced randomness for RSA key generation. In 2012 a study of over 6 million RSA keys deployed on the Internet found that 4% of the keys were identical, and 0.2% shared one prime factor and provided thus no security. NSA has been attributed of putting a reduced random number generator in the RSA BSAFE library.
Other public key cryptosystems
Mostly based on the Discrete logarithm problem
Diffie-Hellman key exchange
Designed to allow two users, Alice and Bob, to share a secret using only public communications:
- Public knowledge: large prime
$p$ , generator$g$ of$Z_p$ - Alice and Bob each select random values a and b respectively where
$0 < a$ ,$b < p-1$ - Alice sends
$g^a \mod p$ to Bob (over an insecure channel) - Bob sends
$g^b \mod p$ to Alice (over an insecure channel) - Alice and Bob both compute their common secret key as
$Z = g^{ab} \mod p$ - An attacker who can find discrete logarithms can break the protocol, it is unknown whether a better way exists.
That was the ephemeral version. The static version stores the secrets long-term and reuse them.
The Diffie-Hellman key exchange is not authenticated, making it suceptible for MitM attacks. TLS solves this with certificates.
Elgamal cryptosystem
- Turns the Diffie-Hellman protocol into a cryptosystem.
- Alice combines her ephermeral private key with Bob's long-term private key.
Elliptic curves
- Based on the Discrete logarithm problem, but defined on elliptic groups.
- Elliptic curves make the discrete log problem more efficient than RSA
- The best known attacks are exponential (RSA is sub-exponential)
- Elliptic curves are algebraic sturctures formed from cubic equations:
- An example is the set of all
$(x,y)$ pairs which satisfy the equation:$y^2 = x^3 + ax + b \mod p$ , which is a curve over the field$Z_p$ . - Elliptic curves can be defined over any field.
- Once an identity element is added, a binary operation (like multiplication) can be defined on these points. This operation forms a group over the elliptic curve points, often called the elliptic curve group
- An example is the set of all
Elliptic curve groups can be generated at any time, but we often use standard curves. It is desirable for curves to be generated in a way to ensure there are no hidden special properties, which the standard curves have disputed
The best known algorithms for solving the elliptic curve discrete log problem are exponential in the length of the parameters. Consequently elliptic curve implementations use much smaller keys. Compared with RSA the relative advantage of elliptic curve cryptography will increase at higher security levels.
This shows that a 128bit AES key is as easy to brute force as factorization of a 3072bit RSA modulus. Elliptic curves groups of 256bits
Post-quantum cryptography
Quantum computers could significantly reduce the computational complexity of factorization and the discrete logarithm problem. Symmetric key cryptography (AES) is only slightly affected, meaning an increase in key size is all that is needed in a post-quantum world.
NIST is working on standardizing post-quantum cryptosystems, which are based on different problems:
- lattice problems
- coding theory
- solving multi-variable polynomials
Hash functions and MACs
A hash function
$H$ is simple and fast to compute$H$ takes as its input a message$m$ of arbitrary length, and outputs a message digest$H(m)$ of fixed length.$H$ is a one-way function where$m$ should not be inferrable by$H(m)$ through other means, other than brute force.
Hash function properties
More formally, for a hash function to be any good, it must satisfy these three properties:
- Pre-image resistance (one-way)
- Given
$H(m)$ , it should be difficult to find$m$ - Second pre-image resistance
- Given
$m_1$ , it should be difficult to find$m_2$ such that$H(m_1) = H(m_2)$ - Collision resistance
- It should be difficult to find any arbitrary pair of messages (
$(m_{1}, m_{2})$ ) such that$H(m_1) = H(m_2)$
That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function, derived from the birthday paradox:
For a hash function with a 256-bit output, you will expect to need on average of
If you’re confused about the difference between collision resistance and second pre-image resistance:
Collision resistance is about finding any two messages that produce the same hash, but you don’t care what the hash is as long as two distinct but known messages produce it.
On the other hand, second pre-image resistance is about finding a second message that produces a given hash.
For use as PRNGs, Hash function should be also carefully constructed so that you avoid these entropy-reducing conditions:
- Convergence
- When
$H(H(H(...H(m)...)))$ will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output. - Cycles
- When
$H_{n}(m) = H_{n + k}(m)$ for some integer$k$ .
Hash functions work much like a digital signature, with the exception of not needing a key. Nice to use as an integrity check if the hash-digest can by other means be authenticated. They are the building blocks of MACs and cryptographic signatures, and can be used to speed up lookup operations.
Storing passwords
Passwords are scary to store in plaintext or encrypted. Can lead to information leakage which is both a security and PR disaster. Therefore it is expected to hash passwords before storing them. You can't then leak a password, since you don't know it, you can only verify it.
To avoid hash-dictionary attacks, passwords are salted before being hashed:
, then both the hash
Interated hash functions
Cryptographic hash functions need to take arbitrary-sized input and produce a fixed size output.
- As we saw with block ciphers, we can process arbitrary-sized data by having a function that processes fixed-sized data and use it repeatedly on blocks.
- An iterated hash function splits the input into blocks of fixed size and operates on these block sequentially using the same function for fixed-size inputs.
- Merkle-Damgård construction: Use a fixed-size compression function applied to multiple block of the message.
- A compression function takes in two n-bit messages
$m_1, m_2$ and produces$m_3=h(m_1, m_2)$ of n-bits
- A compression function takes in two n-bit messages
MD5
Same family as MD2 and MD4, all producing a 128 bit output. All of them are broken, due to collisions. An MD5 can be found in one minute on a PC in 2006.
SHA-0, SHA-1
Based on MDx, but larger and more complex. SHA-1 replaced SHA-0 with only a minor change to the algorithm. Produces a 160bit output. An attack on SHA1 was found in 2017, 100000 times faster than a brute-force search.
SHA2
An umbrella-term of multiple variants, mainly just varying the hash output size and the block size. SHA-224, SHA-256, SHA-512, etc...
SHA3
Doesn’t use a compression function as in Merkle–Damgård construction. Instead it uses a sponge construction. Standardized in 2015, not widely used yet due to no hardware acceleration being readily available.
MAC
MAC are a digital signature, which signs a message. It depends on a secret key and the message. Should be verifiable by using either a shared secret or with public-key cryptography.
A MAC must be unforgeable:
Must not be feasible to produce a message
HMAC
(Hash-based MAC)
- Built from an iterated cryptographic hash function
$H$ , e.g., MD5, SHA-1, SHA-256, ... - Standardized and used in many applications including TLS and IPsec.
- Designed to resist length extension attacks
- Often used as a PRNG for cryptographic protocols
, where
Authenticated encryption.
- Encrypt and MAC
- Encrypt
$m$ , apply MAC to$m$ , send$(m,t)$ - MAC then encrypt
- Apply MAC to
$m$ , then encrypt$m\|t$ , send encrypted blob - Encrypt the MAC
- Encrypt
$m$ , apply MAC to ciphertext$c$ , send$(c,t)$
(
MAC then encrypt seem to be the safest approach. TLS has supported all of the modes at some point. Has since been fixed and expanded to support EAED.
Example: AES Galois Counter Mode (GCM) provides authenticated encryption.
Digital Signatures and Certificates
A digital signature is different from a real-world signature, as it is dependent on the message being signed.
- Digital signatures use public key cryptography to provide the properties of a MAC and more.
- Only the owner of the private key can generate a correct digital signature.
- Digital signatures provide the non-repudiation security service. A judge can decide which party formed the signature.
Digital signature elements
A digital signature scheme has three algorithms:
- Key generation
- Signature generation
- Signature verfication.
Key generation outputs two keys:
- A private signature generation key or simply singning key,
$K_s$ : - A public signature verification key,
$K_v$ .
Digital certificates
- When using a public key to encrypt a message or to verify a digital signature, it is essential to be confident of the correct binding between a public key and its owner.
- Otherwise, an attacker could man-in-the-middle the whole session:
Alice <-tls-> attacker <-tls-> Bob
- Otherwise, an attacker could man-in-the-middle the whole session:
- Normally this is achieved through use of digital certificates which contain the public key and owner identity, and usually other information such as signature algorithm and a validity period.
- The certificate is digitally signed by a trusted third party tasked to be a certificate verifier, normally called a certification authority or CA.
- Certificates play a central role in key management for public key infrastructures.
Attacks
Signatures must be unforgeable, meaning constructing a signature
An attacker may try to break digital signature scheme in multiple ways:
- Key recovery
- The attacker attempts to recover the private signing key from the public verification key and some known signatures.
- Selective forgery
- The attacker attempts to forge a signature for a given message.
- Existential forgery
- The attacker attempts to forge a message with valid signature. The message could be meaningless.
Digital signatures are considered secure if they resist existential forgery under a chosen message attack.
Digital signatures are not resistant to replay-attacks. Therefore, message
RSA
Signature generation is done by hashing the message and encrypting the hash with the signing key. Verification is done by decrypting the signature with the public verification key and comparing it to a recomputed hash of the content. More formally:
Discrete logarithm signatures
relies on the difficulty of the discrete log problem
- Original Elgamal signatures
- Computes the signature directly from the message, the key and a random number
- Digital signature algorithm (DSA)
- A highly optimized version of Elgamal
- restricted to a subgroup of
$Z^{*}_p$ . - Hashes the message with SHA first
- ECDSA: DSA based on elliptic curve groups
- Has shorter keys with the same level of security
- Signature size can be varied with the chosen curve group.
Public key infrastructure (PKI)
PKI is the key management environment for public key information of a public key cryptographic system. Concerns itself with the the lifecycle of long-term cryptographic keys including generation, distribution, storage and their destruction/invalidation.
- It is essential to be confident of relation between a public key and its owner
- Normally achieved with digital certificates containing the public key, owner identity and a validity period.
- These certificates are signed by a trusted party, normally called a certification authority (CA).
In this context the terms used so far changes, and new ones are introduced (OpenSSL/TLS use these):
- Private key
- This is a structure containing both the private key and its corresponding public key, and all the signature keys
- Public key
- This is a structure containing only the public encryption key and the signature verification key. Often used synonymously with *Certificate*
- Certificate
- This is a structure containing the Public key mention above, along with owner identity and a validity period, all signed by a CA. Often used synonymously with *Public key*. These certificate use the X.509 standard, which supports and has lots of extentions.
- Certificate Authority
- Comes in the form of a root CA or a intermediate CA. Their job is to verify the owner and sign their CSR. The signature can be in the form of a chain, where an intermediate CA signed by a trusted root CA can sign certificate on the behalf of the root CA.
- Certificate signing request (CSR)
- Contains a certificate without any signatures. This is the data sendt to the CA for them to verify, make changes to and sign.
Hierarchical PKI
Certificate can as mentioned be signed by intermediate CAs. Say the root CA
Revocation
Sometimes it may be required to declare a certificate invalid even though its validity period is still current.
- Certificate revocation lists (CRL)
- each CA periodically issues a list of revoked certificates which can be downloaded and then checked by clients before using a certificate
- Online certificate status protocol (OCSP)
- a server will maintain a current list of revoked certificates and respond to requests about specific certificates.
A revocation is usually signed by the certificate being revoked or any of the CAs which originaly signed it.
Certificate transparancy (CT)
- CT aims to prevent a certificate being issued without it being publicly known
- Certificates store their assumed key/position in one or more public CT logs, which anyone may check.
- These public logs store only valid certificates supplied by anyone (could be CA or certificate owner)
- Pubic log supplies a proof, checked by client, that a certificate is contained in the log
- Proposed by Google from 2013 and implemented in Chrome
- If a CA certificate is compromised and put to use by an another party, then the logs would produce inconsistencies
Protocols for key establishments
Deals with the multiple ways to:
- Generate keys
- long-term keys. (static keys)
- short-term / ephemeral keys (session keys)
- Distribute keys
- Long-term keys protect the distributions of session keys
- Protect keys
- Destroy keys
Key establishment
In practice session keys are usually symmetric keys used with ciphers such as AES and MACs because of their greater efficiency over public key algorithms.
Long-term keys, usually asymmetric, are used to establish these session keys. If the long-term keys are symmetric, you have to either use key pre-distribution, or use an online server containing the symmetric long-term keys.
Key transfer using asymmetric cryptography
You can send a AES session key with asymmetric encryption, but this does not provide forward secrecy.
- (Perfect) Forward secrecy
- A compromised long-term private key does not reveal previous session keys established using that compromised long-term key.
Therefore you should use Key agreement instead. The Diffie–Hellman protocol is an example of this, where both parties each provide input to the keying material.
Both methods are supported by TLS
Kerberos
Kerberos is an example of session key distribution with an online server, using symetric keys.
Kerberos is a Single sign-on (SSO) solution for which provide access selectively for a number of different online services using individual tickets. It establishes a session key to deliver confidentiality and integrity services for each service access.
It does this through 3 levels:
- Level 1
-
Interaction with authentication server
- obtain a ticket-granting ticket
- Happens once for a session
- Level 2
-
Interaction with ticket-granting server
- obtain a service ticket
- Happens once for each server during the session
- Level 3
-
Interaction with application server
- obtain a service
- Happens once for each time the client requires service during the session
The user only authenticates once at the start of the session.
Kerberos is limited in scalability, and is best suited to corporate environments with shared trust. The kerberos master / trusted authority (TA) is a single point of attack. The security of the whole network depends on it.
Needham–Schroeder protocol
Kerberos is based on the Needham–Schroeder protocol. It is One of the most widely known key establishment protocols
The protocol was vulnerable to replay attacks, but was promptly modified to account for this by introducing the concept of freshness, with three basic mechanisms:
- random challenges (nonces),
- timestamps (a string on the current time), and
- counters (increased for each new message).
Repaired Needham–Schroeder protocol: (using random challenges)
,where:
$S$ - trusted authority$A, B$ - two parties establishing a session$K_{xy}$ - A session key shared between$x$ and$y$ $N_x$ - A one-time use randomly generated nonce. (n-once)$S\rightarrow A : M$ -$S$ sending message$M$ to$A$ $\{X\}_K$ - Message$X$ , encrypted and authenticated with$K$
In general: some TA is granting short-lived session keys for the users and application to use. It is the responsibility of the TA to dissallow unauthenticated users to access the system.
Transport Layer Security (TLS)
Transport Layer Security (TLS) is one of the most widely used security services today. It's a general purpose service implemented as a set of protocols that rely on TCP and based on PKI. It can be implemented as part of the underlying protocol suite, or alternatively embedded in specific packages. TLS is designed to make use of TCP to provide a reliable end-to-end secure service. It is a very complex protocol and has been the subject of many attacks, and subsequent repair. It went through a major structural change rom version 1.2 to version 1.3. TLS consists of three important protocols:
- the Handshake Protocol: The encryption scheme, keys, extensions are exchanged here.
- the Record Layer Protocol: This is the protocol which transfers the encrypted application data.
- the Alert Protocol: Events related to the TLS layer, not a part of the application layer. Mostly warnings and errors.
The Handshake Protocol
The Handshake Protocol is the most complex part of TLS. This protocol allows the server and client to authenticate each other and to negotiate an encryption and MAC algorithm and cryptographic keys to be used to protect the data sent in a TLS record. The Handshake Protocol is used before any application data is transmitted.
The Hanshake Protocols consists of 4 different phases:
- Phase 1
- Establish security capabilities, including protocol version, session ID, cipher suite, compression method, and intitial random numbers.
- Phase 2
- Server may send certificate, key exchange, and request certificate. Server signals end of hello message phase.
- Phase 3
- Client sends certificate if requested. Client sends key exchange. Client may send certificate verification.
- Phase 4
- Change cipher suite and finish handshake protocol.
Commonly, this protocol uses asymetric encryption for the handshake, and it generates a symetric key to be used in the Record Protocol. This was originally done since symetric encryption is less computationally expensive, but has since been expanded to provide forward secrecy as well.
The Record protocol
The record protocol provides confidentiality and message integrity. The record protocol takes an application message to be transmitted, fragments the data into manageable blocks, optionally compresses the data, applies a MAC, encrypts it, adds some headers, and transmits the resulting unit in a TCP segment. The received data is decrypted, verified, decompressed, and reassembled before being delivered to the application. This protocol commonly uses symmetric encryption.
The Alert Protocol
The Alert Protocol is used to convey TLS-related alerts to the peer entity. As with other applications that use TLS, alert messages are compressed and encrypted, as specified by the current state. Each message in this protocol consists of two bytes. The first byte takes the value warning or fatal to convey the severity of the message. If the level is fatal, TLS immediately terminates the connection. Other connections on the same session may continue, but no new connections on this session may be established. The second byte contains a code that indicates the specific alert.
TLS Security
- Different kinds of attacks: implementation errors, poor choice of cryptographic primitives, flaws in protocol.
- Backward compatibility is a problem (downgrade attacks).
- Only a very small subset of the supported ciphers are reguarded as safe, meaning clients and servers must be configured with allowed ciphers.
- Several examples of the principle that “attacks only get better” over time.
- TLS 1.3 simplifies the handshake.
- TLS 1.3 adds new features (e.g. 0-RTT mode) which present new challenges.
Email security
Email security is difficult to get right, especially since it is by default not secure, meaning a recipient replying in plaintext could leak the whole conversation.
DomainKeys Identified Mail (DKIM)
- Done by the email service provider.
- Allows the sending mail domain to sign outgoing mail using RSA signatures (currenty supported signature algorithm)
- Receiving domain can verify origin of mail
- Widely used by prominent email providers including Gmail
- Helps prevent email spoofing and hence reduce spam and phishing.
- Public verification key of sending domain retrieved using domain name system (DNs).
STARTTLS
- Extension to mail protocols SMPT, POP and IMAP run over TLS connections.
- Provides link-by-link security, not end-to-end security
- Opportunistic use of TLS security (encryption) - use it if possible.
- Defined for IMAP and POP3 (RFC 2595) and for SMPT (RFC 3207)
- Widely used by prominent email providers including Gmail and Microsoft Outlook
- Vulnerable to so-called STRIPTLS attacks: attacker interrupts TLS negotiation and connection falls back to plaintext transmission.
PGP
- Clientside system for encryption and signing, can be signed in plaintext, or signed+encrypted.
- Protection of email message contents
- Hybrid encryption - a new random "session key" is generated for each object (message) and the session key is encrypted with the long-term public key of recipient.
- Signing using RSA or DSA signatures.
- Compression using Zip
- Coding using radix-64 (Base64) to ensure that binary blobs can be sent in email body.
- Uses a web-of-trust, where the users verify and sign each other, instead of relying on a trusted certificate authority.
IP Security
- Internet Protocol security (IPsec) is a framework for ensuring secure communications over Internet Protocol (IP) networks.
- It provides similar security services as TLS, but at a lower layer in the communications protocol stack.
-
Security can be added to both IPv4 and IPv6.
-
IPSec are located at the network layer.
-
IPsec Encapsulating Security PayLoad (ESP) protocol can operate either in transport mode or in tunnel mode.
-
IP level security encompasses three functional areas: authentication, confidentiality, and key management.
- The authentication mechanism assures that a received packet was, in fact, transmitted by the part identified as the source in the packet header. In addtition, this mechanism assures that the packet has not been altered in transit.
- The confidentiality facility enables communicating nodes to encrypt messages to prevent eavesdropping by third parties.
- The key management facility is concerned with the secure exchange of keys.
IPsec modes
- Transport mode
- iphost-to-iphost security where protection covers to the payload of IP-data. Encrypts IP payload and any IPv6 extension headers following the ESP header.
- Tunnel mode
- Network/gateway-to-network/gateway security. Protection for the entire IP-packet by adding outer IP-packet between the network gateways.
IPsec architectures
Gateway-to-gateway architecture
- Provides secure network communications between two networks.
- Network traffic is routed through the IPsec connection, protecting it appropriately
- Only protects data between two gateways
- Most often used when connecting two secured networks, such as linking a branch office to headquarters over the internet
- Can be less costly than private wide area network (WAN) circuits.
Host-to-gateway architecture
- Commonly used to provide secure remote access.
- The organisation deploys a virtual private network (VPN) gateway onto their network.
- Each remote access user establishes a VPN connection between the local computer (host) and the gateway.
- VPN gateway may be a dedicated device or part of another network device.
- Most often used when connecting hosts on unsecured network to resources on secured networks.
Host-to-host architecture
- Typically used for special purpose needs, such as system administrators performing remote management of a single server.
- Only model that provides protection for data throughout its transit (end-to-end).
- Resource-intensive to implement and maintain in terms of user and host management.
- All user systems and servers that will participate in VPNs need to have VPN software installed and/or configured.
- Key management is often accomplished through the manual process.
old
Chapter 1
- Data origin authentication
- provides confirmation of the claimed source (origin) of a data unit (message)
- Entity authentication
- provides confirmation of the claimed identity of an entity
- Non-repudiation
- Ensures that you cannot hide the source (origin) of a data unit (message).
- Substitution
- each character (or set of characters) is replaced by a different character (or set of characters)
- Transposition
- the characters in the plaintext are mixed up with each other (permuted)
Chapter 2
- Confusion
- This involves substitution to make the relationship between the key and ciphertext as complex as possible.
- Diffusion
- This involves transformations that dissipate the statistical properties of the plaintext across the ciphertext. (A small change in plaintext should give a completely different ciphertext)
- Group generator
- A group generator is a group element g that in the operation
$g^k \mod p$ represents all the elements that are relatively prime to p. Consider for example the group G$\mathbb{Z}_5$ . G consists of the elements {1, 2, 3, 4}. One or several of these group elements may be a group generator for G. A way to check this is to use$g^k \mod 5$ for the k group elements and see if they represent all the group elements.
Example:
2 is a group generator for
- Finite field
- placeholder
Chapter 3
- Electronic Code Book (ECB) Mode
- The basic mode of a block cipher. Plaintext block
$P$ is encrypted with key$K$ to produce ciphertext block$C_t$ . Ciphertext block$C$ is decrypted with key$K$ to produce plaintext block$P_t$ . - Cipher Block Chaining (CBC) Mode
- A random initialization vector (IV) is chosen and sent together with the ciphertext blocks.
$C_t=E(P_t\oplus C_{t-1},K)$ where$C_0=IV$ $P_t=D(C_t,K)\oplus C_{t-1}$ where$C_0=IV$ - CTR Mode
- CTR is a synchronous stream cipher. The keystream is generated by encrypting successive values of a "counter", initialised using a nonce (randomly chosen value) N:
$O_t=E(T_t,K)$ , where$T_t=N||t$ is the concatenation of the nonce and block number t. Encryption:$C_t=O_t\oplus P_t$ . Decryption:$P_t=O_t\oplus C_t$ . - True Random Number Generator (TRNG)
- is a physical process which outputs each valid string independently with equal probability
- Pseudo Random Number Generator (PRNG)
- is a deterministic algorithm which approximates a TRNG.
- Message Authentication Code (MAC)
- A message authentication code (MAC) is a cryptographic mechanism used for ensuring message integrity. A MAC tag should at least be of length
$\log_2{I/R}$ where$I$ is a limit on how many invalid messages are detected before the key is changed and$R$ is the acceptable probability that a false message is accepted. - Stream Cipher
- Stream ciphers are characterised by the generation of a keystream using the short key and an initialisation value as input. Each element of the keystream is used successively to encrypt one or more ciphertext characters. Stream ciphers are usually symmetric key ciphers: sender and receiver share the same key and can generate the same keystream given the same initialisation value.
- Synchronous stream ciphers
- The keystream is generated independently of the plaintext. Both sender and receiver need to generate the same keystream and synchronise on its usage.
- One Time Pad
- The key is a random sequence of characters, all of them independently generated. Each character in the key is used one time only. The one-time pad provides perfect secrecy.
- Linear Feedback Shift Register
- A LFSR is a common component in design of stream ciphers.
Chapter 4
- Factorisation problem
- Given an integer of m bits, find its prime factors. Factorisation by trial division is an exponential time algorithm and is hopeless for numbers of a few hundred bits. A number of special purpose methods exist, which apply if the integer to be factorised has special properties. The best current general method is known as the number field sieve. The number field sieve is a sub-exponential time algorithm.
- Discrete logarithm problem
- Let g be a generator of
$\mathbb{Z}_p^*$ for a prime p. The discrete log problem over$\mathbb{Z}_p^*$ is: given y in$\mathbb{Z}_p^*$ find x with y =$g^x$ mod p. - Big O-notation
- placeholder
- Fermat test
- if a number p is prime then
$a^{p−1}$ mod p = 1 for all a with gcd(a, p) = 1. If we examine a number n and find that$a^{n−1}$ mod n$\neq$ 1 then we know that n is not prime. - Miller-Rabin test
- placeholder
- RSA Encryption equation
- placeholder
- RSA Decryption equation
- placeholder
- RSA Padding
- placeholder
- Prime number theorem
- placeholder
- Håstad's attack
- placeholder
Ciphers
Symmetric cipher
(secret key cipher) encryption and decryption keys known only to sender and receiver. (DES)
Asymmetric cipher
(public key cipher) each participant has a public key and a private key, may allow both encryption and signatures. (RSA)
Attacks
Ciphertext only attack
the cryptanalyst has available only the intercepted cipher text.
Known plaintext attack
the cryptanalyst knows a small amount of plaintext and its cipher text equivalent
Chosen plaintext attack
The cryptanalyst can obtain the cipher text equivalent of some plaintext which can be selected by the attacker, i.e the attacker has an "inside encryptor” available
Chosen cipher text attack
The cryptanalyst can obtain the plaintext equivalent of some cipher text which can be selected by the attacker, i.e. the attacker has an “inside decryptor” available.
Historical ciphers
- Caesar
- Substitution
- Vigenère. Caesar, but also uses a key in order to choose how many steps to shift the alphabet for each letter
Key expansion
- Input
- 16-byte key.
- Output
- 176 bytes (44 words)
The first four words are used in the initial AddRoundKey step. The next ten word-groups are used in the ten rounds of the cipher.
RSA
Iterated ciphers
- Encryption process divided into r similar rounds
- the sub encryption functions g are the same for all rounds
- Each key
$K_i$ is derived from the overall master key K. The keys$K_i$ are called round keys or subkeys and are derived from K using a process called the key schedule.
Encryption
Given a plaintext block, P, a round function g and round keys
Pseudorandom number generation
Generated numbers should be as random as possible. This is defined by two criteria:
- Uniform distribution
- The frequency of occurence of ones and zeros should be approximately equal.
- Independence
- It should not be possible to infer a subsequence from any other.