Information leakage happens when a system helps an adversary achieve some goal. We can understand leakage through quantitative information flow (QIF), a robust framework that gives ways to quantify the amount of leakage of a system. In this post, we will walk through how we can statically model a system as a channel matrix. We’ll also see how a channel maps a prior probability distribution to a hyper-distribution that helps an adversary narrow down the set of possible secrets that could have produced an output.
First, I’ll go over an overview of the concepts then we can work through a real example. I’ll spend some time discussing what a hyper-distribution is and how to interpret it. The terminology and example I use are from this paper: Quantifying Information Leakage of Deterministic Encryption.
Overview
QIF generally assumes that an adversary knows all possible ‘secrets’ of a system, and she knows the probability of each of these secrets π. A system or ‘channel’ will take in a secret input and produce some observable output that may help the adversary achieve some goal.
For example, let’s think about a classic side-channel attack: an adversary could observe that the computer is consuming a lot of power during decryption, and she may be able to determine the size of the encryption key (the secret).
We can statically model all possible inputs and outputs of the system as an information-theoretic channel matrix that gives conditional probabilities p(y,x).
The rows represent all possible secrets and the columns represent all possible outputs. Every entry in the matrix indicates the probability of that output given that input.
Recall the adversary also knows the probability of each of the secrets, the prior probability distribution. She can multiply the rows of the channel matrix by the prior probability and compute the joint distribution. This is represented by a joint matrix J that gives joint probabilities p(x, y).
At this point, she can sum the columns of J to calculate the marginal distribution on outputs, which gives the probability of every output. Posterior distributions can be calculated by normalizing the columns of J. When we drop the output labels (they don’t really matter), we get the hyper-distribution that gives probabilities p(x | y). Given a hyper-distribution, an adversary can observe a particular output, and know which secrets could have produced it.
The key thing to notice here is that a channel is a mapping from priors to hyper-distributions
Example: Keys to the Castle
Let’s say we have a castle that alternates through three different keys to open their gates. Through an odd twist of tradition, they raise a colored flag, chosen probabilistically, according to which key is used to open the gate.
When key #1 is used, they raise either the teal flag or the purple flag, with equal probability. When key #2 is used, they raise the purple flag 1/4 of the time, the blue flag 1/2 of the time, and the red flag 1/4 of the time. They have a similarly odd choice for when they use key #3 where they raise the teal flag 1/2 of the time, the purple flag 1/3 of the time, and the blue flag 1/6 of the time.
We can statically represent all keys and flags in a channel matrix. Informally, here’s how we can read the channel: when the secret is key #1, the flag is the teal one half the time and the purple one half the time, etc.
Let’s say the villagers choose key #1 1/4 of the time, key #2 1/2 of the time, and key #3 the other 1/4 of the time. We can use this prior probability distribution π and our channel C to calculate the joint matrix, the marginal probability distribution, and the hyper distribution.
We first scale the channel matrix C by the prior probability distribution π to calculate the joint matrix showing p(x, y). We then sum the rows of the joint matrix to calculate the marginal distribution on outputs. We then normalize J to calculate the hyper-distribution.
Let’s examine the hyper-distribution. What do these columns mean? What is a hyper-distribution?
A hyper-distribution is a distribution on posterior distributions. In our example, we have a distribution on outputs (1/4, 1/3, 7/24, 1/8). We call this the ‘outer distribution.’ But we also have four posterior distributions (1/2, 0, 1/2), (3/8, 3/8, 1/4), (0, 6/7, 1/7), and (0, 1, 0). We call these ‘inners’.
Let’s look at these ‘inners’ and see what they mean for our adversary who is trying to guess the secret key.
These four inners represent all the worlds of knowledge that an adversary could end up in after observing the output (the raised flag). For example, if she sees the teal flag, she will be in the first ‘world’ and she’ll know with certainty that the key could not have been key #2.
Notice in this last world which represents the case when she observes the red flag, she knows with certainty that the secret key is key #2. However, this ‘world’ only occurs 1/8 of the time. She’s much more likely to end up in another world. For example, 1/3 of the time, she’ll see the purple flag (here represented by the second column). Unfortunately for her, when she sees this flag, she will be less sure about the secret key. It could be key #1 or key #2 with equal probability. It could even be key #3.