I’m writing up a short summary for each classic cybersecurity paper that I have to know for my qualification exam. This week, let’s chat about “Reflections on Trusting Trust” by Ken Thompson (1984).
The three page paper comes from Ken Thompson’s Turing Award lecture in 1984. In it, he details a very elegant attack by which a backdoor can be injected into a program through a malicious compiler, leaving no evidence in the actual source code of the compiler or the program.
Moral
The moral is obvious. You can’t trust code that you did not totally create yourself.
Thompson argues that a skillful attacker can install a bug that will be almost impossible to detect. He spends the rest of the speech to say that most unauthorized access to computer systems is vandalism and should be treated as such by the media, the law, and society at large. This drawn out emphasis on the treatment of vandals undercuts the seriousness of the previous attack.
The moral is not that we cannot trust code we did not write ourselves. The moral is that even the code we write could be corrupted. The keyword in Thompson’s paper is “totally.” This turtles-all-the-way-down attack stops only when you use your own vacuums.
Attack Details
The attack relies on three stages.
1. Self-Reproducing Programs
You can create a program that produces itself as output, comments and everything.
Yeo Kheng Meng provides a good demo of self-reproducing programs if you want to see what this looks like in C.
2. Knowledge Perpetuation
We can train compilers to recognize new sequences due to the fact that the C compiler is written in C.
Let’s look at the example Thompson uses. The C compiler knows what ‘\n’ means. The code looks like this:
c = next( ); if(c != '\\') /* If c isn't a backslash, return c */ return(c); c = next( ); /* c was a backslash, what's next? */ if(c == '\\') /* If c is a backslash, return \ */ return('\\') if(c == 'n') /* If c== n, return new line */ return('\n')
Say we want to introduce a new sequence ‘\v’ to represent the vertical tab character. We would like to just add one more line.
c = next( ); if(c != '\\') return(c); c = next( ); if(c == '\\') return('\\') if(c == 'n') return('\n') if(c == 'v') /* New: Check if c = v. If so, return '\v' */ return('\v')
We try to compile this new code with our old binary and we get an error. Our old C binary has no idea what ‘\v’ means. We have to teach it.
We look up that the vertical tab character in ASCII is 11. We modify the code as follows:
c = next( ); if(c != '\\') return(c); c = next( ); if(c == '\\') return('\\') if(c == 'n') return('\n') if(c == 'v') return(11) /* Set \v to equal ASCII 11 */
We compile this code with our old binary. It works! When we do this compilation, it produces a new binary that knows that \v ==11.
Now here’s the fun part. We can install this new binary that knows \v == 11 while our compiler source code can be our first (failed) attempt with the following code.
if(c == 'v') /* Check if c = v. If so, return '\v' */ return('\v')
When this source code is compiled with a binary that knows \v ==11, the resulting binary will still know that \v == 11.
Here’s an graphic of what we just did. The first line is our failed attempted to just return ‘\v’ with the old compiler. The second line is when we teach it to understand that \v ==11. The last line means we can use our old code that just returns ‘\v’ as the portable compiler if we compile it with a binary that has already learned that \v ==11.
3) The Attack: Hidden Compiler Backdoor
We use the previous two stages to install a hidden backdoor to another program in a compiler.
Let’s again follow Thompson’s example and attempt to insert a hidden backdoor in the login.c program. We don’t put it in the login.c program itself since someone could inspect the code and see it.
We could try to insert the bug aimed at login.c in the compiler so that if it saw a certain pattern, it would insert a bug in the program.
if ( program is login.c) compile login with backdoor
But this is visible too. Anyone inspecting the compiler code would be able to detect this.
To avoid detection, Thompson tells us to insert another bug that uses a self-replicating program from step 1. This bug is aimed at the compiler itself. This second bug inserts the first Trojan horse and itself into the compiler.
if ( program is login.c) /* Bug 1 */ compile login with backdoor if ( program is compiler ) /* Bug 2 */ compile and insert above bug and self
This second bug exploits the learning process from step 2 to produce a malicious or poisoned binary. By self-replicating the two bugs, we perpetuate poisoning of the compiler. To avoid detection, we delete the malicious compiler’s source code and set the poisoned binary as the official C compiler.
To perform the attack, we use the poisoned binary to compile a clean compiler and the login.c program. The backdoor is obscured in the binary.
Examining the source code will not reveal the presence of the backdoor.
*See Schneier’s blog post for a mitigation technique developed by David A. Wheeler .