Apr. 4th, 2016

Trusted Platform Modules are fairly unintelligent devices. They can do some crypto, but they don't have any ability to directly monitor the state of the system they're attached to. This is worked around by having each stage of the boot process "measure" state into registers (Platform Configuration Registers, or PCRs) in the TPM by taking the SHA1 of the next boot component and performing an extend operation. Extend works like this:

New PCR value = SHA1(current value||new hash)

ie, the TPM takes the current contents of the PCR (a 20-byte register), concatenates the new SHA1 to the end of that in order to obtain a 40-byte value, takes the SHA1 of this 40-byte value to obtain a 20-byte hash and sets the PCR value to this. This has a couple of interesting properties:
  • You can't directly modify the contents of the PCR. In order to obtain a specific value, you need to perform the same set of writes in the same order. If you replace the trusted bootloader with an untrusted one that runs arbitrary code, you can't rewrite the PCR to cover up that fact
  • The PCR value is predictable and can be reconstructed by replaying the same series of operations
But how do we know what those operations were? We control the bootloader and the kernel and we know what extend operations they performed, so that much is easy. But the firmware itself will have performed some number of operations (the firmware itself is measured, as is the firmware configuration, and certain aspects of the boot process that aren't in our control may also be measured) and we may not be able to reconstruct those from scratch.

Thankfully we have more than just the final PCR data. The firmware provides an interface to log each extend operation, and you can read the event log in /sys/kernel/security/tpm0/binary_bios_measurements. You can pull information out of that log and use it to reconstruct the writes the firmware made. Merge those with the writes you performed and you should be able to reconstruct the final TPM state. Hurrah!

The problem is that a lot of what you want to measure into the TPM may vary between machines or change in response to configuration changes or system updates. If you measure every module that grub loads, and if grub changes the order that it loads modules in, you also need to update your calculations of the end result. Thankfully there's a way around this - rather than making policy decisions based on the final TPM value, just use the final TPM value to ensure that the log is valid. If you extract each hash value from the log and simulate an extend operation, you should end up with the same value as is present in the TPM. If so, you know that the log is valid. At that point you can examine individual log entries without having to care about the order that they occurred in, which makes writing your policy significantly easier.

But there's another source of fragility. Imagine that you're measuring every command executed by grub (as is the case in the CoreOS grub). You want to ensure that no inappropriate commands have been run (such as ones that would allow you to modify the loaded kernel after it's been measured), but you also want to permit certain variations - for instance, you might have a primary root filesystem and a fallback root filesystem, and you're ok with either being passed as a kernel argument. One approach would be to write two lines of policy, but there's an even more flexible approach. If the bootloader logs the entire command into the event log, when replaying the log we can verify that the event description hashes to the value that was passed to the TPM. If it does, rather than testing against an explicit hash value, we can examine the string itself. If the event description matches a regular expression provided by the policy then we're good.

This approach makes it possible to write TPM policies that are resistant to changes in ordering and permit fine-grained definition of acceptable values, and which can cleanly separate out local policy, generated policy values and values that are provided by the firmware. The split between machine-specific policy and OS policy allows for the static machine-specific policy to be merged with OS-provided policy, making remote attestation viable even over automated system upgrades.

We've integrated an implementation of this kind of policy into the TPM support code we'd like to integrate into Kubernetes, and CoreOS will soon be generating known-good hashes at image build time. The combination of these means that people using Distributed Trusted Computing under Tectonic will be able to validate the state of their systems with nothing more than a minimal machine-specific policy description.

The support code for all of this should also start making it into other distributions in the near future (the grub code is already in Fedora 24), so with luck we can define a cross-distribution policy format and make it straightforward to handle this in a consistent way even in hetrogenous operating system environments. Remote attestation is a powerful tool for ensuring that your systems are in a valid state, but the difficulty of policy management has been a significant factor in making it difficult for people to deploy in their data centres. Making it easier for people to shield themselves against low-level boot attacks is a big step forward in improving the security of distributed workloads and makes bare-metal hosting a much more viable proposition.
There's a piece of software called XScreenSaver. It attempts to fill two somewhat disparate roles:
  • Provide a functioning screen lock on systems using the X11 windowing system, a job made incredibly difficult due to a variety of design misfeatures in said windowing system[1]
  • Provide cute graphical output while the screen is locked
XScreenSaver does an excellent job of the second of these[2] and is pretty good at the first, which is to say that it only suffers from a disastrous security flaw once very few years and as such is certainly not appreciably worse than any other piece of software.

Debian ships an operating system that prides itself on stability. The Debian definition of stability is a very specific one - rather than referring to how often the software crashes or misbehaves, it refers to how often the software changes behaviour. Debian is very reluctant to upgrade software that is part of a stable release, to the extent that developers will attempt to backport individual security fixes to the version they shipped rather than upgrading to a release that contains all those security fixes but also adds a new feature. The argument here is that the new release may also introduce new bugs, and Debian's users desire stability (in the "things don't change" sense) more than new features. Backporting security fixes keeps them safe without compromising the reason they're running Debian in the first place.

This all makes plenty of sense at a theoretical level, but reality is sometimes less convenient. The first problem is that security bugs are typically also, well, bugs. They may make your software crash or misbehave in annoying but apparently harmless ways. And when you fix that bug you've also fixed a security bug, but the ability to determine whether a bug is a security bug or not is one that involves deep magic and a fanatical devotion to the cause so given the choice between maybe asking for a CVE and dealing with embargoes and all that crap when perhaps you've actually only fixed a bug that makes the letter "E" appear in places it shouldn't and not one that allows the complete destruction of your intergalactic invasion fleet means people will tend to err on the side of "Eh fuckit" and go drinking instead. So new versions of software will often fix security vulnerabilities without there being any indication that they do so[3], and running old versions probably means you have a bunch of security issues that nobody will ever do anything about.

But that's broadly a technical problem and one we can apply various metrics to, and if somebody wanted to spend enough time performing careful analysis of software we could have actual numbers to figure out whether the better security approach is to upgrade or to backport fixes. Conversations become boring once we introduce too many numbers, so let's ignore that problem and go onto the second, which is far more handwavy and social and so significantly more interesting.

The second problem is that upstream developers remain associated with the software shipped by Debian. Even though Debian includes a tool for reporting bugs against packages included in Debian, some users will ignore that and go straight to the upstream developers. Those upstream developers then have to spend at least 15 or so seconds telling the user that the bug they're seeing has been fixed for some time, and then figure out how to explain that no sorry they can't make Debian include a fixed version because that's not how things work. Worst case, the stable release of Debian ends up including a bug that makes software just basically not work at all and everybody who uses it assumes that the upstream author is brutally incompetent, and they end up quitting the software industry and I don't know running a nightclub or something.

From the Debian side of things, the straightforward solution is to make it more obvious that users should file bugs with Debian and not bother the upstream authors. This doesn't solve the problem of damaged reputation, and nor does it entirely solve the problem of users contacting upstream developers. If a bug is filed with Debian and doesn't get fixed in a timely manner, it's hardly surprising that users will end up going upstream. The Debian bugs list for XScreenSaver does not make terribly attractive reading.

So, coming back to the title for this entry. The most obvious failure of the commons is where a basically malicious actor consumes while giving nothing back, but if an actor with good intentions ends up consuming more than they contribute that may still be a problem. An upstream author releases a piece of software under a free license. Debian distributes this to users. Debian's policies result in the upstream author having to do more work. What does the upstream author get out of this exchange? In an ideal world, plenty. The author's software is made available to more people. A larger set of developers is willing to work on making improvements to the software. In a less ideal world, rather less. The author has to deal with bug mail about already fixed bugs. The author's reputation may be harmed by user exposure to said fixed bugs. The author may get less in the way of useful bug fixes or features because people are running old versions rather than fixing new ones. If the balance tips towards the latter, the author's decision to release their software under a free license has made their life more difficult.

Most discussions about Debian's policies entirely ignore the latter scenario, focusing more on the fact that the author chose to release their software under a free license to begin with. If the author is unwilling to handle the consequences of that, goes the argument, why did they do it in the first place? The unfortunate logical conclusion to that argument is that the author realises that they made a huge mistake and never does so again, and woo uh oops.

The irony here is that one of Debian's foundational documents, the Debian Free Software Guidelines, makes allowances for this. Section 4 allows for distribution of software in Debian even if the author insists that modified versions[4] are renamed. This allows for an author to make a choice - allow themselves to be associated with the Debian version of their work and increase (a) their userbase and (b) their support load, or try to distinguish what Debian ship from their identity. But that document was ratified in 1997 and people haven't really spent much time since then thinking about why it says what it does, and so this tradeoff is rarely considered.

Free software doesn't benefit from distributions antagonising their upstreams, even if said upstream is a cranky nightclub owner. Debian's users are Debian's highest priority, but those users are going to suffer if developers decide that not using free licenses improves their quality of life. Kneejerk reactions around specific instances aren't helpful, but now is probably a good time to start thinking about what value Debian bring to its upstream authors and how that can be increased. Failing to do so doesn't serve users, Debian itself or the free software community as a whole.

[1] The X server has no fundamental concept of a screen lock. This is implemented by an application asking that the X server send all keyboard and mouse input to it rather than to any other application, and then that application creating a window that fills the screen. Due to some hilarious design decisions, opening a pop-up menu in an application prevents any other application from being able to grab input and so it is impossible for the screensaver to activate if you open a menu and then walk away from your computer. This is merely the most obvious problem - there are others that are more subtle and more infuriating. The only fix in this case is to nuke the site from orbit.

[2] There's screenshots here. My favourites are the one that emulate the electrical characteristics of an old CRT in order to present a more realistic depiction of the output of an Apple 2 and the one that includes a complete 6502 emulator.

[3] And obviously new versions of software will often also introduce new security vulnerabilities without there being any indication that they do so, because who would ever put that in their changelog. But the less ethically challenged members of the security community are more likely to be looking at new versions of software than ones released three years ago, so you're probably still tending towards winning overall

[4] There's a perfectly reasonable argument that all packages distributed by Debian are modified in some way

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Google. Member of the Free Software Foundation board of directors. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Expand Cut Tags

No cut tags