I've been working on TPMs lately. It turns out that they're moderately awful, but what's significantly more awful is basically all the existing documentation. So here's some of what I've learned, presented in the hope that it saves someone else some amount of misery.

What is a TPM?

TPMs are devices that adhere to the Trusted Computing Group's Trusted Platform Module specification. They're typically microcontrollers[1] with a small amount of flash, and attached via either i2c (on embedded devices) or LPC[2] (on PCs). While designed for performing cryptographic tasks, TPMs are not cryptographic accelerators - in almost all situations, carrying out any TPM operations on the CPU instead would be massively faster[3]. So why use a TPM at all?

Keeping secrets with a TPM

TPMs can encrypt and decrypt things. They're not terribly fast at doing so, but they have one significant benefit over doing it on the CPU - they can do it with keys that are tied to the TPM. All TPMs have something called a Storage Root Key (or SRK) that's generated when the TPM is initially configured. You can ask the TPM to generate a new keypair, and it'll do so, encrypt them with the SRK (or another key descended from the SRK) and hand it back to you. Other than the SRK (and another key called the Endorsement Key, which we'll get back to later), these keys aren't actually kept on the TPM - the running OS stores them on disk. If the OS wants to encrypt or decrypt something, it loads the key into the TPM and asks it to perform the desired operation. The TPM decrypts the key and then goes to work on the data. For small quantities of data, the secret can even be stored in the TPM's nvram rather than on disk.

All of this means that the keys are tied to a system, which is great for security. An attacker can't obtain the decrypted keys, even if they have a keylogger and full access to your filesystem. If I encrypt my laptop's drive and then encrypt the decryption key with the TPM, stealing my drive won't help even if you have my passphrase - any other TPM simply doesn't have the keys necessary to give you access.

That's fine for keys which are system specific, but what about keys that I might want to use on multiple systems, or keys that I want to carry on using when I need to replace my hardware? Keys can optionally be flagged as migratable, which makes it possible to export them from the TPM and import them to another TPM. This seems like it defeats most of the benefits, but there's a couple of features that improve security here. The first is that you need the TPM ownership password, which is something that's set during initial TPM setup and then not usually used afterwards. An attacker would need to obtain this somehow. The other is that you can set limits on migration when you initially import the key. In this scenario the TPM will only be willing to export the key by encrypting it with a pre-configured public key. If the private half is kept offline, an attacker is still unable to obtain a decrypted copy of the key.

So I just replace the OS with one that steals the secret, right?

Say my root filesystem is encrypted with a secret that's stored on the TPM. An attacker can replace my kernel with one that grabs that secret once the TPM's released it. How can I avoid that?

TPMs have a series of Platform Configuration Registers (PCRs) that are used to record system state. These all start off programmed to zero, but applications can extend them at runtime by writing a sha1 hash into them. The new hash is concatenated to the existing PCR value and another sha1 calculated, and then this value is stored in the PCR. The firmware hashes itself and various option ROMs and adds those values to some PCRs, and then grabs the bootloader and hashes that. The bootloader then hashes its configuration and the files it reads before executing them.

This chain of trust means that you can verify that no prior system component has been modified. If an attacker modifies the bootloader then the firmware will calculate a different hash value, and there's no way for the attacker to force that back to the original value. Changing the kernel or the initrd will result in the same problem. Other than replacing the very low level firmware code that controls the root of trust, there's no way an attacker can replace any fundamental system components without changing the hash values.

TPMs support using these hash values to decide whether or not to perform a decryption operation. If an attacker replaces the initrd, the PCRs won't match and the TPM will simply refuse to hand over the secret. You can actually see this in use on Windows devices using Bitlocker - if you do anything that would change the PCR state (like booting into recovery mode), the TPM won't hand over the key and Bitlocker has to prompt for a recovery key. Choosing which PCRs to care about is something of a balancing act. Firmware configuration is typically hashed into PCR 1, so changing any firmware configuration options will change it. If PCR 1 is listed as one of the values that must match in order to release the secret, changing any firmware options will prevent the secret from being released. That's probably overkill. On the other hand, PCR 0 will normally contain the firmware hash itself. Including this means that the user will need to recover after updating their firmware, but failing to include it means that an attacker can subvert the system by replacing the firmware.

What about using TPMs for DRM?

In theory you could populate TPMs with DRM keys for media playback, and seal them such that the hardware wouldn't hand them over. In practice this is probably too easily subverted or too user-hostile - changing default boot order in your firmware would result in validation failing, and permitting that would allow fairly straightforward subverted boot processes. You really need a finer grained policy management approach, and that's something that the TPM itself can't support.

This is where Remote Attestation comes in. Rather than keep any secrets on the local TPM, the TPM can assert to a remote site that the system is in a specific state. The remote site can then make a policy determination based on multiple factors and decide whether or not to hand over session decryption keys. The idea here is fairly straightforward. The remote site sends a nonce and a list of PCRs. The TPM generates a blob with the requested PCR values, sticks the nonce on, encrypts it and sends it back to the remote site. The remote site verifies that the reply was encrypted with an actual TPM key, makes sure that the nonce matches and then makes a policy determination based on the PCR state.

But hold on. How does the remote site know that the reply was encrypted with an actual TPM? When TPMs are built, they have something called an Endorsement Key (EK) flashed into them. The idea is that the only way to have a valid EK is to have a TPM, and that the TPM will never release this key to anything else. There's a couple of problems here. The first is that proving you have a valid EK to a remote site involves having a chain of trust between the EK and some globally trusted third party. Most TPMs don't have this - the only ones I know of that do are recent Infineon and STMicro parts. The second is that TPMs only have a single EK, and so any site performing remote attestation can cross-correlate you with any other site. That's a pretty significant privacy concern.

There's a theoretical solution to the privacy issue. TPMs never actually sign PCR quotes with the EK. Instead, TPMs can generate something called an Attestation Identity Key (AIK) and sign it with the EK. The OS can then provide this to a site called a PrivacyCA, which verifies that the AIK is signed by a real EK (and hence a real TPM). When a third party site requests remote attestation, the TPM signs the PCRs with the AIK and the third party site asks the PrivacyCA whether the AIK is real. You can have as many AIKs as you want, so you can provide each service with a different AIK.

As long as the PrivacyCA only keeps track of whether an AIK is valid and not which EK it was signed with, this avoids the privacy concerns - nobody would be able to tell that multiple AIKs came from the same TPM. On the other hand, it makes any PrivacyCA a pretty attractive target. Compromising one would not only allow you to fake up any remote attestation requests, it would let you violate user privacy expectations by seeing that (say) the TPM being used to attest to HolyScriptureVideos.com was also being used to attest to DegradingPornographyInvolvingAnimals.com.

Perhaps unsurprisingly (given the associated liability concerns), there's no public and trusted PrivacyCAs yet, and even if they were (a) many computers are still being sold without TPMs and (b) even those with TPMs often don't have the EK certificate that would be required to make remote attestation possible. So while remote attestation could theoretically be used to impose DRM in a way that would require you to be running a specific OS, practical concerns make it pretty difficult for anyone to deploy that at any point in the near future.

Is this just limited to early OS components?

Nope. The Linux kernel has support for measuring each binary run or each module loaded and extending PCRs accordingly. This makes it possible to ensure that the running binaries haven't been modified on disk. There's not a lot of distribution infrastructure for setting this up, but in theory a distribution could deploy an entirely signed userspace and allow the user to opt into only executing correctly signed binaries. Things get more interesting when you add interpreted scripts to the mix, so there's still plenty of work to do there.

So what can I actually use a TPM for?

Drive encryption is probably the best example (Bitlocker does it on Windows, and there's a LUKS-based implementation for Linux here) - while in theory you could do things like use your TPM as a factor in two-factor authentication or tie your GPG key to it, there's not a lot of existing infrastructure for handling all of that. For the majority of people, the most useful feature of the TPM is probably the random number generator. rngd has support for pulling numbers out of it and stashing them in /dev/random, and it's probably worth doing that unless you have an Ivy Bridge or other CPU with an RNG.

Things get more interesting in more niche cases. Corporations can bind VPN keys to corporate machines, making it possible to impose varying security policies. Intel use the TPM as part of their anti-theft technology on education-oriented devices like the Classmate. And in the cloud, projects like Trusted Computing Pools use remote attestation to verify that compute nodes are in a known good state before scheduling jobs on them.

Is there a threat to freedom?

At the moment, probably not. The lack of any workable general purpose remote attestation makes it difficult for anyone to impose TPM-based restrictions on users, and any local code is obviously under the user's control - got a program that wants to read the PCR state before letting you do something? LD_PRELOAD something that gives it the desired response, or hack it so it ignores failure. It's just far too easy to circumvent.

Summary?

TPMs are useful for some very domain-specific applications, drive encryption and random number generation. The current state of technology doesn't make them useful for practical limitations of end-user freedom.

[1] Ranging from 8-bit things that are better suited to driving washing machines, up to full ARM cores
[2] "Low Pin Count", basically ISA without the slots.
[3] Loading a key and decrypting a 5 byte payload takes 1.5 seconds on my laptop's TPM.
According to the update here, the signing keys are supposed to be replaced by the hardware vendor. If vendors do that, this ends up being uninteresting from a security perspective - you could generate a signed image, but nothing would trust it. It should be easy enough to verify, though. Just download a firmware image from someone using AMI firmware, pull apart the capsule file, decompress everything and check whether the leaked public key is present in the binaries.

The real risk here is that even if most vendors have replaced that key, some may not have done. There's certainly an argument that shipping test keys at all increases the probability that a vendor will accidentally end up using those rather than generating their own, and it's difficult to rule out the possibility that that's happened.
(See here for an update to this)

A hardware vendor apparently had a copy of an AMI private key on a public FTP site. This is concerning, but it's not immediately obvious how dangerous this is for a few reasons. The first is that this is apparently the firmware signing key, not any of the Secure Boot keys. That means it can't be used to sign a UEFI executable or bootloader, so can't be used to sidestep Secure Boot directly. The second is that it's AMI's key, not a board vendor - we don't (yet) know if this key is used to sign any actual shipping firmware images, or whether it's effectively a reference key. And, thirdly, the code apparently dates from early 2012 - even if it was an actual signing key, it may have been replaced before any firmware based on this code shipped.

But there's still the worst case scenario that this key is used to sign most (or all) AMI-based vendor firmware. Can this be used to subvert Secure Boot? Plausibly. The attack would involve producing a new, signed firmware image with Secure Boot either disabled or with an additional key installed, and then to reflash that firmware. Firmware images are very board-specific, so unless you're engaging in a very targeted attack you either need a large repository of firmware for every board you want to attack, or you need to perform in-place modification.

Taking a look at the firmware update tool used for AMI systems, the latter might be possible. It seems that the AMI firmware driver allows you to dump the existing ROM to a file. It'd then be a matter of pulling apart the firmware image, modifying the key database, putting it back together, signing it and flashing it. It looks like doing this does require that the user enter the firmware password if one's set, so the simplest mitigation strategy would be to do that.

So. If this key is used by most vendors shipping AMI-based firmware, and if it's a current (rather than test) key, then it may well be possible for it to be deployed in an automated malware attack that subverts the Secure Boot trust model on systems running AMI-based firmware. The obvious lesson here is that handing out your private keys to third parties that you don't trust is a pretty bad idea, as is including them in source repositories.

(Wow, was this really as long ago as 2004? How little things change)
I gave a presentation at Libreplanet this weekend on the topic of Secure Boot and Restricted Boot. There's a copy of the video here - it should be up on the conference site at some point. It turned out to be excellent timing, in that a group in Spain filed a complaint with the European Commission this morning arguing that Microsoft's imposition of Secure Boot on the x86 client PC market is anticompetitive. I suspect that this is unlikely to succeed (the Commission has already stated that the current implementation appears to conform to EU law), and I fear that it's going to make it harder to fight the real battle we face.

Secure Boot means different things to different people. I think the FSF's definition is a useful one - Secure Boot is any boot validation scheme in which ultimate control is in the hands of the owner of the device, while Restricted Boot is any boot validation scheme in which ultimate control is in the hands of a third party. What Microsoft require for x86 Windows 8 devices falls into the category of Secure Boot - assuming that OEMs conform to Microsoft's requirements, the user must be able to both disable Secure Boot entirely and also leave Secure Boot enabled, but with their own choice of trusted keys and binaries. If the FSF set up a signing service to sign operating systems that met all of their criteria for freeness, Microsoft's requirements would permit an end user to configure their system such that it refused to run non-free software. My system is configured to trust things shipped by Fedora or built locally by me, a decision that I can make because Microsoft require that OEMs support it. Any system that meets Microsoft's requirements is a system that respects the freedom of the computer owner to choose how restrictive their computer's boot policy is.

This isn't to say that it's ideal. The lack of any common UI or key format between hardware vendors makes it difficult for OS vendors to document the steps users must take to assert this freedom. The presence of Microsoft as the only widely trusted key authority leaves people justifiably concerned as to whether Microsoft will be equally aggressive in blacklisting its own products as it will be in blacklisting third party ones. Implementation flaws in a (very) small number of systems have resulted in correctly signed operating systems failing to boot, requiring users to update their firmware before being able to install anything but Windows.

But concentrating on these problems misses the wider point. The x86 market remains one where users are able to run whatever they want, but the x86 market is shrinking. Users are purchasing tablets and other ARM-based ultraportables. Some users are using phones as their primary computing device. In contrast to the x86 market, Microsoft's policies for the ARM market restrict user freedom. Windows Phone and Windows RT devices are required to boot only signed binaries, with no option for the end user to disable the signature validation or install their own keys. While the underlying technology is identical, this differing set of default policies means that Microsoft's ARM implementation is better described as Restricted Boot. The hardware vendors and Microsoft define which software will run on these systems. The owner gets no say.

And, unfortunately, Microsoft aren't alone. Apple, the single biggest vendor in this market, implement effectively identical restrictions. Some Android vendors provide unlockable bootloaders, but others (either through personal preference or at the behest of phone carriers) lock down their platforms. A naive user is likely to end up purchasing a device that will, in the absence of exploited security flaws, refuse to run if any system components are modified. Even in cases where the underlying components are built using free software, there's no guarantee that the user will have the ability to assert any of those freedoms.

Why does this matter? Some of these platforms (notably Windows RT and iOS, but also some Android-based devices) will even refuse to run unsigned applications. Users are unable to write their own software and distribute it to others without agreeing to often onerous restrictions. Users with the misfortune of living in the wrong country may be forbidden from even that opportunity. The vendor may choose to block applications that compete with their own, reducing innovation. The ability to explore and tinker with the components of the system is restricted, making it harder for users to learn how modern operating systems work. If I own a perfectly functional phone that no longer receives vendor updates, I don't even have the option of paying a third party to ensure that I can't be compromised by a malicious website and risk the loss of passwords or financial details. The user is directly harmed by these restrictions.

I won't argue that there are no benefits to curated software ecosystems. I won't even argue against devices shipping with a locked down policy by default. I will strongly argue that the owner of a device should not only have the freedom to choose whether they wish to remain within those locked-down boundaries, but should also have the freedom to impose their own boundaries. There should be no forced choice between freedom and security.

Those who argue against Secure Boot risk depriving us of the freedom to make a personal decision as to who we trust. Those who argue against Secure Boot while ignoring Restricted Boot risk depriving us of even more. The traditional PC market is decreasing in importance. Unless we do anything about it, free software will be limited to a niche group of enthusiasts who've carefully chosen from a small set of devices that respect user freedom. We should have been campaigning against Restricted Boot 10 years ago. Don't delay it even further by fighting against implementations that already respect user freedom.
The problem with Samsung laptops bricking themselves turned out to be down to the UEFI variable store becoming more than 50% full and Samsung's firmware being dreadful, but the trigger was us writing a crash dump to the nvram. I ended up using this feature to help someone get a backtrace from a kernel oops during suspend today, and realised that it's not been terribly well publicised, so.

First, make sure pstore is mounted. If you're on 3.9 then do:

mount -t pstore /sys/fs/pstore /sys/fs/pstore

For earlier kernels you'll need to find somewhere else to stick it. If there's anything in there, delete it - we want to make sure there's enough space to save future dumps. Now reboot twice[1]. Next time you get a system crash that doesn't make it to system logs, mount pstore again and (with luck) there'll be a bunch of files there. For tedious reasons these need to be assembled in reverse order (part 12 comes before part 11, and so on) but you should have a crash log. Report that, delete the files again and marvel at the benefits that technology has brought to your life.

[1] UEFI implementations generally handle variable deletion by flagging the space as reclaimable rather than immediately making it available again. You need to reboot in order for the firmware to garbage collect it. Some firmware seems to require two reboot cycles to do this properly. Thanks, firmware.
It's fairly straightforward to boot a UEFI Secure Boot system using something like Shim or the Linux Foundation's loader, and for distributions using either the LF loader or the generic version of Shim that's pretty much all you need to care about. The physically-present end user has had to explicitly install new keys or hashes, and that means that you no longer need to care about Microsoft's security policies or (assuming there's no exploitable flaws in the bootloader itself) fear any kind of revocation.

But what about if you're a distribution that cares about booting without the user having to install keys? There's several reasons to want that (convenience for naive users, ability to netboot, that kind of thing), but it has the downside that your system can now be used as an attack vector against other operating systems. Do you care about that? It depends how you weigh the risks. First, someone would have to use your system to attack another. Second, Microsoft would have to care enough to revoke your signature. The first hasn't happened yet, so we have no real idea how likely the second is. However, it doesn't seem awfully unlikely that Microsoft would be willing to revoke a distribution signature if that distribution were being used to attack Windows.

How do you avoid that scenario? There's various bits of security work you need to do, but one of them is to require that all your kernel modules be signed. That's easy for the modules in the distribution, since you just sign them all before shipping them. But how about third party modules? There's three main options here:

  1. Don't support third party modules on Secure Boot systems
  2. Have the distribution sign the modules
  3. Have the vendor sign the modules

The first option is easy, but not likely to please users. Or hardware vendors. Not ideal.

The second option is irritating for a bunch of reasons, and a pretty significant one is license-related. If you sign a module, does that mean you're endorsing it in some way? Does signing the nvidia driver mean that you think there's no license concerns? Even ignoring that, how do you decide whose drivers to sign? We can probably assume that companies like AMD and nvidia are fairly reputable, but how about Honest John's Driver Emporium? Verifying someone's identity is astonishingly expensive to do a good job of yourself, and not hugely cheaper if you farm it out to a third party. It's also irritating for the driver vendor, who needs a separate signature for every distribution they support. So, while possible, this isn't an attractive solution.

The third option pushes the responsibility out to other people, and it's always nice to get other people to do work instead of you. The problem then is deciding whose keys you trust. You can push that off to the user, but it's not the friendliest solution. The alternative is to trust any keys that are signed with a trusted key. But what is a trusted key? Having the distribution sign keys just pushes us back to option (2) - you need to verify everyone's identity, and they need a separate signing key for every distribution they support. In an ideal world, there'd be a key that we already trust and which is owned by someone willing to sign things with it.

The good news is that such a key exists. The bad news is that it's owned by Microsoft.

The recent discussion on LKML was about a patchset that allowed the kernel to install new keys if they were inside a PE/COFF binary signed by a trusted key. It's worth emphasising that this patchset doesn't change the set of keys that the kernel trusts - the kernel trusts keys that are installed in your system firmware, so if your system firmware trusts the Microsoft key then your kernel already trusts the Microsoft key. The reasoning here is pretty straightforward. If your firmware trusts things signed by Microsoft, and if a bad person can get things signed by Microsoft, the bad person can already give you a package containing a backdoored bootloader. Letting them sign kernel modules doesn't alter the power they already have over your system. Microsoft will sign PE/COFF binaries, so a vendor would just have to sign up with Microsoft, pay $99 to Symantec to get their ID verified, wrap their key in a PE/COFF binary and then get it signed by Microsoft. The kernel would see that this object was signed by a trusted key and extract and install the key.

Linus is, to put it mildly, unenthusiastic about this idea. It adds some extra complexity to the kernel in the form of a binary parser that would only be used to load keys from userspace, and the kernel already has an interface for that in the form of X509 certificates. The problem we have is that Microsoft won't sign X509 certificates, and there's no way to turn a PE/COFF signature into an X509 signature. Someone would have to re-sign the keys, which starts getting us back to option (2). One way around this would be to have an automated service that accepts PE/COFF objects, verifies that they're signed by Microsoft, extracts the key, re-signs it with a new private key and spits out an X509 certificate. That avoids having to add any new code to the kernel, but it means that there would have to be someone to run that service and it means that their public key would have to be trusted by the kernel by default.

Who would that third party be? The logical choice might be the Linux Foundation, but since we have members of the Linux Foundation Technical Advisory Board saying that they think module signing is unnecessary and that there's no real risk of revocation, it doesn't seem likely that they'll be enthusiastic. A distribution could do it, but there'd be arguments about putting one distribution in a more privileged position than others. So far, nobody's stood up to do this.

A possible outcome is that the distributions who care about signed modules will all just carry this patchset anyway, and the ones who don't won't. That's probably going to be interpreted by many as giving too much responsibility to Microsoft, but it's worth emphasising that these patches change nothing in that respect - if your firmware trusts Microsoft, you already trust Microsoft. If your firmware doesn't trust Microsoft, these patches will not cause your kernel to trust Microsoft. If you've set up your own chain of trust instead, anything signed by Microsoft will be rejected.

What's next? It wouldn't surprise me too much if nothing happens until someone demonstrates how to use a signed Linux system to attack Windows. Microsoft's response to that will probably determine whether anyone ends up caring.
The UEFI bootloader that the Linux Foundation have been working on has just been released. That means we now have two signed bootloaders available - this one and shim.

Does this mean Linux distributions can now support Secure Boot?

They've actually been able to for a while. Ubuntu shipped with Secure Boot support last October, and Fedora shipped with Secure Boot support in January. Both used Shim rather than the Linux Foundation loader, and Shim's also being used by a variety of smaller distributions. The LF loader is a different solution to the same problem.

Is the Linux Foundation the preferred loader for distributions?

Probably not in most cases. One of the primary functional differences between Shim and the LF loader is that the LF loader is based around cryptographic hashes rather than signing keys. This means that the user has to explicitly add a hash to the list of permitted binaries whenever a distribution updates their bootloader or kernel. Doing that involves being physically present at the machine, so it's kind of a pain.

Why use it at all, then?

Being hash based means that you don't need to maintain any signing infrastructure. This means that distributions can support Secure Boot without having to change their build process at all. Shim already supports this use case (and some distributions are using it), but the LF loader has nicer UI for managing it.

Any other reasons?

Actually, yes. Shim implements Secure Boot loading in a less than entirely ideal way - it duplicates the firmware's entire binary loading, validation, relocation and execution code. This is necessary because the UEFI specification doesn't provide any mechanism for adding additional authentication mechanisms. The main downside of this is that the standard UEFI LoadImage() and StartImage() calls don't work under Shim. The LF loader hooks into the low-level security architecture and installs its own handlers, which means the standard UEFI interfaces work. The upshot is that you can use bootloaders like Gummiboot or efilinux without having to modify them to call out to Shim.

Why doesn't Shim do the same?

The UEFI architecture is slightly complicated. The UEFI specification itself defines the upper layers of the firmware, basically covering everything that UEFI applications and operating systems need. It doesn't define the lower layers of a UEFI implementation. Those are contained in the UEFI Platform Initialization spec, and that's what defines the security architecture interfaces that the LF loader hooks into. The problem is that it's completely valid to implement the UEFI specification without implementing the Platform Initialization specification, and if anyone does that then the LF loader will fail.

Can't you try both approaches?

Yes, and that's actually pretty much the plan now. I'm working on integrating the LF loader's UI and security code into Shim with the aim of producing one loader that'll satisfy the full set of use cases, and James is happy with this.

Which should I use?

Depends. If you want to support gummiboot (and aren't willing to patch it to call out to Shim), you'll need to use the LF loader. If you want to use key-based signing setups to avoid forcing re-enrolment on updates, you'll need to use Shim. If you're somewhere in the middle, you can probably use either. Once we've got the code merged, you won't have to make a choice.
I bricked a Samsung laptop today. Unlike most of the reported cases of Samsung laptops refusing to boot, I never booted Linux on it - all experimentation was performed under Windows. It seems that the bug we've been seeing is simultaneously simpler in some ways and more complicated in others than we'd previously realised.

So, some background. The original belief was that the samsung-laptop driver was doing something that caused the system to stop working. This driver was coded to a Samsung specification in order to support certain laptop features that weren't accessible via any standardised mechanism. It works by searching a specific area of memory for a Samsung-specific signature. If it finds it, it follows a pointer to a table that contains various magic values that need to be written in order to trigger some system management code that actually performs the requested change. This is unusual in this day and age, but not unique. The problem is that the magic signature is still present on UEFI systems, but attempting to use the data contained in the table causes problems.

We're not quite sure what those problems are yet. Originally we assumed that the magic values we wrote were causing the problem, so the samsung-laptop driver was patched to disable it on UEFI systems. Unfortunately, this doesn't actually fix the problem - it just avoids the easiest way of triggering it. It turns out that it wasn't the writes that caused the problem, it was what happened next. Performing the writes triggered a hardware error of some description. The Linux kernel caught and logged this. In the old days, people would often never see these logs - the system would then be frozen and it would be impossible to access the hard drive, so they never got written to disk. There's code in the kernel to make this easier on UEFI systems. Whenever a severe error is encountered, the kernel copies recent messages to the UEFI variable storage space. They're then available to userspace after a reboot, allowing more accurate diagnostics of what caused the crash.

That crash dump takes about 10K of UEFI storage space. Microsoft require that Windows 8 systems have at least 64K of storage space available. We only keep one crash dump - if the system crashes again it'll simply overwrite the existing one rather than creating another. This is all completely compatible with the UEFI specification, and Apple actually do something very similar on their hardware. Unfortunately, it turns out that some Samsung laptops will fail to boot if too much of the variable storage space is used. We don't know what "too much" is yet, but writing a bunch of variables from Windows is enough to trigger it. I put some sample code here - it writes out 36 variables each containing a kilobyte of random data. I ran this as an administrator under Windows and then rebooted the system. It never came back.

This is pretty obviously a firmware bug. Writing UEFI variables is expressly permitted by the specification, and there should never be a situation in which an OS can fill the variable store in such a way that the firmware refuses to boot the system. We've seen similar bugs in Intel's reference code in the past, but they were all fixed early last year. For now the safest thing to do is not to use UEFI on any Samsung laptops. Unfortunately, if you're using Windows, that'll require you to reinstall it from scratch.
The recent Linux kernel commits avoid one mechanism by which Samsung laptops can be bricked, but the information we now have indicates that there are other ways of triggering this. It also seems likely that it's possible for a userspace application to cause the same problem under Windows. We're still trying to figure out the full details, but until then you're safest ensuring that you're using BIOS mode on Samsung laptops no matter which operating system you're running.
(Edit: It's been suggested that the title of this could give the wrong impression. "Don't like Secure Boot? That's not a reason to buy a Chromebook" may have been better)

People are, unsurprisingly, upset that Microsoft have imposed UEFI Secure Boot on the x86 market. A situation in which one company gets to determine which software will boot on systems by default is obviously open to abuse. What's more surprising is that many of the people who are upset about this are completely fine with encouraging people to buy Chromebooks.

Out of the box, Chromebooks are even more locked down than Windows 8 machines. The Chromebook firmware validates the kernel, and the kernel verifies the filesystem. Want to run a version of Chrome you've built yourself? Denied. Thankfully, Google have provided a way around this - you can (depending on the machine) either flip a physical switch or perform a special keystroke in the firmware to disable the validation. Doing so deletes all your data in the process, in order to avoid the situation where a physically present attacker wants to steal your data or backdoor your system unnoticed, but after that it'll boot any OS you want. The downside is that you've lost the security that you previously had. If a remote attacker manages to replace your kernel with a backdoored one, the firmware will boot it anyway. Want the same level of security as the stock firmware? You can't. There's no way for you to install your own signing keys, and Google won't sign third party binaries. Chromebooks are either secure and running Google's software, or insecure and running your software.

Much like Chromebooks, Windows 8 certified systems are required to permit the user to disable Secure Boot. In contrast to Chromebooks, Windows 8 certified systems are required to permit the user to install their own keys. And, unlike Google, Microsoft will sign alternative operating systems. Windows 8 certified systems provide greater user freedom than Chromebooks.

Some people don't like Secure Boot because they don't trust Microsoft. If you trust Google more, then a Chromebook is a reasonable choice. But some people don't like Secure Boot because they see it as an attack on user freedom, and those people should be willing to criticise Google's stance. Unlike Microsoft, Chromebooks force the user to choose between security and freedom. Nobody should be forced to make that choice.

(Updated to add that some Chromebooks have a software interface for disabling validation)
Executive summary: Most things work fine.

Things we know are broken:
  • Some Samsung laptops. The samsung-laptop driver is a slightly weird thing. By 2010 (when it first appeared) most vendors had moved over to using some level of firmware abstraction, either using ACPI or WMI. Samsung still seemed to be stuck around a decade earlier - they were providing a region of memory at a known address, and you'd read that address to find a bunch of offsets. Then you'd write magic values based on those offsets to magic system IO ports based on those offsets and something would happen. Those writes were triggering System Management Mode, a special x86 CPU mode where the processor executes code from memory that the OS can't see, without telling the OS that it's doing so. There's nothing especially new in this (SMM first appeared in the 386sl back in 1990), but it also means that you depend on the system vendor not changing the interface without telling you. Turns out that Samsung apparently changed their platform interface when they moved to UEFI, but didn't actually do anything to prevent old drivers from breaking things - performing exactly the same series of accesses on some modern Samsung laptops gives an uncorrectable machine check exception (in the best case) or destroys your firmware (in the worst case). Given that the driver was written to Samsung's specifications, this is pretty obviously Samsung's fault, but that's probably little consolation to anyone who ended up with a dead laptop. Although, given Samsung's track record, this may not be surprising.

    On the bright side, some of the machines that are affected by this predate Secure Boot, so at least it's not a Secure Boot bug.
  • Some Toshibas won't boot Linux. This turns out to be some staggering incompetence on the part of Toshiba (or, more likely, their third-party vendor) - they managed to leave the signing key out of the database that's used to validate binaries, and managed to leave the signature database signing key out of the database that's used to provide whitelist or blacklist updates. The good news is that this is a blatant violation of Microsoft's Windows 8 certification guidelines, and that seems to have encouraged Toshiba to actually fix their BIOS. The bad news is that any of the affected machines that are currently available are still broken, and Toshiba don't seem to be willing to actually give you the firmware update yet.
  • Some Lenovos will only boot Windows or Red Hat Enterprise Linux. I recommend drinking, because as far as I know they haven't actually got around to doing anything useful about this yet.

Not an amazingly positive list, but as far as I know that's about it - other than some Samsungs, one range of Toshibas and one range of Lenovo desktops, Linux should be fine. If you have any other UEFI system that's unable to install Fedora 18, let me know and we'll do our best to work out what's going on.
ITWire ran a story on the 12th of January entitled "Fedora still has problems with secure boot". It ends up discussing two issues with the author's experience with the Fedora installer - that he can't reclaim any free space on existing drives, and that the installer didn't automatically add entries for Windows to the grub menu. These are certainly legitimate issues, and I don't want to suggest that it's reasonable for people to have to manually alter their configuration in order to support dual boot. But they're not issues with Fedora's support for secure boot, despite the enthusiasm with which certain people have jumped on the story. We've received one credible report of a secure boot related problem with Fedora on a couple of Toshiba laptops, which appears (and I want to stress that we're still working on diagnosing it) to be a firmware bug rather than any kind of problem with Fedora.

Mistakes happen in journalism, and sometimes there are differences of opinion. But this story is simply wrong. When asked about it in the comments, the author failed to support his position. When contacted, the editor in chief was willing to add a note saying that I disputed the arguments but was unwilling to remove the incorrect claims. As a result, the internet remains full of links and reposts of an article that unashamedly tells users that the current Linux distribution with the best UEFI hardware support has issues with something it has no issues with.

For reasons that are unclear to me, ITWire seems to have some sort of well regarded status in the Australian technical industry. It seems entirely undeserved.
Unfortunately I'm not going to make it to LCA this year, but there's still plenty of UEFI coverage - Dong Wei, Vice President of the UEFI forum, will be giving an introduction to UEFI, and James Bottomley, who's been working on the Linux Foundation's UEFI bootloader solution, will be talking about Secure Boot.

For those of you on this side of the Pacific (or who just can't get enough of listening to my melodious voice), I'll be keynoting the Southern California Linux Expo next month. This will be a discussion of the more social aspects of our work on Secure Boot over the past year and a half - an epic tail of astonishment, terror, politics, confusion and hard won victories, and the remaining issues that are yet unsolved. Clearly not to be missed. If California's too far for you, then come March I'll be at the North East Linux Fest. I'll be talking about how to make sure that your Linux distribution works with Secure Boot, and, assuming I've got the code finished by then, a little about how to break it.
I've been lucky in life. I've known several people who've killed themselves, but I've been friendly with them rather than friends with them. Their deaths have saddened me, but haven't left giant gaping holes that I have trouble imagining filling. I've also been lucky in that I'm not especially predisposed towards depression, and even luckier that the one serious episode I had ended well. That luck means I'm not qualified to speak authoritatively on the topic, but this seems like a good time to share.

Most of 2002 wasn't great for me. Many of my friends had graduated and moved away. My PhD, which was what I'd spent the past few years of my life working towards, was going horribly wrong. The girl I liked didn't like me. Lots of individual small things that didn't fundamentally matter, but which build up into overall feelings of isolation and failure at life. I was spending an increasing number of days not leaving bed and not talking to people. It wasn't that I couldn't have fun - socialising was still pleasurable, and I didn't actively avoid people, but it was always tempered with the knowledge that this was temporary and I'd be returning to the unhappiness afterwards. I couldn't see any sequence of events that would turn my life around and restore my happiness. This was how life was, and this was how life was going to be. I'd irrevocably fucked up, and this was my future.

Looking back, what strikes me is how reasonable this seemed. I could point at specific things that were making me unhappy, and nobody could argue that it was unreasonable to be unhappy about them. There wasn't any point in speaking to people about it, because what would they be able to do except agree that I was justifiably unhappy? I thought about suicide. Not seriously, because overall life still seemed worthwhile, but I could conceive of a level of further unhappiness that would make it seem like the best choice. I don't think it would ever have occured to me to speak to someone about it first. It seemed like it would be the same argument - I'm justifiably unhappy, I already feel like I'm letting my friends down, what could they do other than tell me that my feelings are wrong or make me feel even more guilty? So, when I saw exhortions for people to speak to someone if they felt suicidal, it seemed like they were talking to people who hadn't thought this through as well as me. It felt like I'd thought this all through carefully and rationally and come to a completely logical decision. If changes in circumstances and further consideration made it seem like suicide was the better choice, that would be because it was the better choice. Maybe other people weren't thinking about this as logically as I was. Maybe they'd have their minds changed by speaking to a friend or a professional. I wouldn't. Of course, with hindsight I was rationalising the way I already felt rather than making entirely rational decisions. I could have rationalised myself to death even though there were (in my case) straightforward ways to make my life better.

In any case, I've no idea how close I ever got to that point. Things were at their worst in August - by September I had a new job and new house, and things just got better from there. In the end, the friends I was convinced could do nothing for me ended up giving me the opportunity to find gainful employment and made sure I had somewhere to live. Without them, things might have been different. As it was, I spent less than nine months depressed and it was still the most hellish experience of my life. The thought of returning to that state is terrifying. I was lucky. I might not be again.

There's no terribly good moral here. If I'd known more about depression beforehand, I might have been able to identify what was happening to me and seek professional help. Other than that, I didn't learn anything about how to avoid or deal with depression. The experience didn't make me a better person. I've no advice for people in the same situation. The only thing I gained from it all was the realisation that if I'd felt any worse and knew that this was what I faced for the rest of my life, death might not have been the worst choice I had.

Depression is a huge social problem and we deal with it badly. We refuse to talk about it, and when we do talk about it we mostly limit ourselves to platitudes about how things will get better or placing the blame on depressed people for not wanting to talk to those around them. Sometimes it doesn't get better. Sometimes talking to those around you will make things worse. People need to be aware of what the effects of depression are and get better at identifying it in others, rather than assuming that they'll be able to ask for help themselves. Society as a whole needs to be better at ensuring that professional support is there for people who need it. And, unless we can make massive improvements in our understanding of the causes of depression and effective mechanisms for countering it, we need to accept that it will cost us friends. Let's redirect the anger we feel at their choice to avoid a lifetime of misery into anger at the society that still hasn't done everything it can to help them.

The Microsoft Surface is a fairly attractive bit of tablet hardware, and as a result people have shown interest in running Linux on it. The immediate problem is that (like many ARM devices) it has a locked-down firmware that will only run signed binaries - unlike many other ARM devices, this is implemented using an existing standard (UEFI Secure Boot). Microsoft provide a signing service for UEFI binaries, so it's tempting to think that getting around this restriction would be as simple as taking an existing Linux bootloader, signing it and then booting. Unfortunately Microsoft's signing service signs binaries using a different key (the "Microsoft Windows UEFI Driver Publisher" key) to the one used to sign Windows, and the Surface doesn't carry that key. Booting Linux on these devices would involve finding a flaw in the firmware and using that to run arbitrary code.

Could this also be a problem on x86? In theory - Microsoft don't require that vendors carry the driver publisher key, and so a system could be Windows 8 certified and still not carry it. It's unlikely to occur in practice, though, since any third party expansion hardware will then fail on that device. As a result, anything with PCIe or Expresscard slots is effectively certain to have this key. If anyone finds any counterexamples, please let me know.
Some people have complained that they can't install Linux on new machines because their firmware won't let them choose a boot device. This is part of Windows 8's fast boot support - the keyboard may not be initialised until after the OS has started. To deal with this, insert the install media and reboot your computer. Wait for Windows 8 to start. While holding down shift, click on the power icon and click on restart. When the menu appears, click "Use a device" and then click your install media. If it's listed more than once, choose the one that says "UEFI" in front of it. Your system will now restart and boot off the install media.
It's after Christmas, and some number of people doubtless ended up with Windows 8 PCs and may want to install Linux on them. If you'd like to do that without fiddling with firmware settings, here are your options.
  • Ubuntu 12.10
    The 64-bit version of Ubuntu 12.10 ships with an older version of Shim that's been signed by Microsoft. It should boot out of the box on most systems, but it doesn't have some of the most recent EFI patches that improve compatibility on some machines. Grab it here.
  • Fedora 18
    Fedora 18 isn't quite released yet, but the latest 64-bit test builds include a Microsoft signed copy of the current version of Shim, including the MOK functionality described here. Fedora 18 has some additional EFI support patches that have just been merged into mainline, which should improve compatibility on some machines - especially ones with Radeon graphics. It also has improved support for booting on Macs. You can get it here, but do bear in mind that it's a test release.
  • Sabayon
    According to the wiki, Sabayon now supports UEFI Secure Boot out of the box. I don't know if the current CD images do, though. My understanding is that it's based on the Microsoft signed Shim I discussed here, and you'll have to manually install the key once you've booted the install media. Straightforward enough.
  • Other distributions
    Suse will be using a version of Shim signed by Microsoft, but I don't think it's in any pre-release versions yet. Debian have just merged UEFI support into their installer, but don't have any UEFI Secure Boot support at the moment. I'm not sure what other distributions are planning on doing, but let me know and I'll update the list.
  • The Linux Foundation loader
    The Linux Foundation have still to obtain a signed copy of their bootloader. There's no especially compelling reason to use it - the use case it supports is where you have users who can follow instructions sufficiently to press "y" but not to choose to enrol a key. The most interesting feature it has is the ability to use the MOK database via the usual UEFI LoadImage and StartImage calls, which means bootloaders like gummiboot work. Unfortunately it implements this by hooking into low-level functionality that's not actually required to be present, so relying on this may be somewhat dubious.
I'm pleased to say that a usable version of shim is now available for download. As I discussed here, this is intended for distributions that want to support secure boot but don't want to deal with Microsoft. To use it, rename shim.efi to bootx64.efi and put it in /EFI/BOOT on your UEFI install media. Drop MokManager.efi in there as well. Finally, make sure your bootloader binary is called grubx64.efi and put it in the same directory.

Now generate a certificate and put the public half as a binary DER file somewhere on your install media. On boot, the end-user will be prompted with a 10-second countdown and a menu. Choose "Enroll key from disk" and then browse the filesystem to select the key and follow the enrolment prompts. Any bootloader signed with that key will then be trusted by shim, so you probably want to make sure that your grubx64.efi image is signed with it.

If you want, you're then free to impose any level of additional signing restrictions - it's entirely possible to use this signing as the basis of a complete chain of trust, including kernel lockdowns and signed module loading. However, since the end-user has explicitly indicated that they trust your code, you're under no obligation to do so. You should make it clear to your users what level of trust they'll be able to place in their system after installing your key, if only to allow them to make an informed decision about whether they want to or not.

This binary does not contain any built-in distribution certificates. It does contain a certificate that was generated at build time and used to sign MokManager - you'll need to accept my assurance that the private key was deleted immediately after the build was completed. Other than that, it will only trust any keys that are either present in the system db or installed by the end user.

A couple of final notes: As of 17:00 EST today, I am officially (rather than merely effectively) no longer employed by Red Hat, and this binary is being provided by me rather than them, so don't ask them questions about it. Special thanks to everyone at Suse who came up with the MOK concept and did most of the implementation work - without them, this would have been impossible. Thanks also to Peter Jones for his work on debugging and writing a signing tool, and everyone else at Red Hat who contributed valuable review feedback.
A (well, now former) coworker let me know about a problem he was having with a Lenovo Thinkcentre M92p. It booted Fedora UEFI install media fine, but after an apparently successful installation refused to boot. UEFI installs on Windows worked perfectly. Secure Boot was quickly ruled out, but this could still have been a number of things. The most interesting observation was that the Fedora boot option didn't appear in the firmware boot menu at all, but Windows did. We spent a little while comparing the variable contents, gradually ruling out potential issues - Linux was writing an entry that had an extra 6 bytes in a structure, for instance[1], and a sufficiently paranoid firmware implementation may have been tripping up on that. Fixing that didn't help, though. Finally we tried just taking the Windows entry and changing the descriptive string. And it broke.

Every UEFI boot entry has a descriptive string. This is used by the firmware when it's presenting a menu to users - instead of "Hard drive 0" and "USB drive 3", the firmware can list "Windows Boot Manager" and "Fedora Linux". There's no reason at all for the firmware to be parsing these strings. But the evidence seemed pretty strong - given two identical boot entries, one saying "Windows Boot Manager" and one not, only the first would work. At this point I downloaded a copy of the firmware and started poking at it. Turns out that yes, actually, there is a function that compares the descriptive string against "Windows Boot Manager" and appears to return an error if it doesn't match. What's stranger is that it also checks for "Red Hat Enterprise Linux" and lets that one work as well.

This is, obviously, bizarre. A vendor appears to have actually written additional code to check whether an OS claims to be Windows before it'll let it boot. Someone then presumably tested booting RHEL on it and discovered that it didn't work. Rather than take out that check, they then addded another check to let RHEL boot as well. We haven't yet verified whether this is an absolute string match or whether a prefix of "Red Hat Enterprise Linux" is sufficient, and further examination of the code may reveal further workarounds. For now, if you want to run Fedora[2] on these systems you're probably best off changing the firmware to perform a legacy boot.

[1] src/include/efi.h: uint8_t padding[6]; /* Emperically needed */, says the efibootmgr source code. Unhelpful.
[2] Or Ubuntu, or Suse, or…
Everything here applies to this as well. Thanks, VMware. In Microsoft's case it turned out that nothing was using that value and it could be replaced without breaking things. With luck that's true here as well.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Nebula. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Expand Cut Tags

No cut tags