mjg59 | Producing a trustworthy x86-based Linux appliance

Let's say you're building some form of appliance on top of general purpose x86 hardware. You want to be able to verify the software it's running hasn't been tampered with. What's the best approach with existing technology?

Let's split this into two separate problems. The first is to do as much as we can to ensure that the software can't be modified without our consent[1]. This requires that each component in the boot chain verify that the next component is legitimate. We call the first component in this chain the root of trust, and in the x86 world this is the system firmware[2]. This firmware is responsible for verifying the bootloader, and the easiest way to do this on x86 is to use UEFI Secure Boot. In this setup the firmware contains a set of trusted signing certificates and will only boot executables with a chain of trust to one of these certificates. Switching the system into setup mode from the firmware menu will allow you to remove the existing keys and install new ones.

(Note: You shouldn't use the trusted certificate directly for signing bootloaders - instead, the trusted certificate should be used to sign another certificate and the key for that certificate used to sign your bootloader. This way, if you ever need to revoke the signing certificate, you can simply sign a new one with the trusted parent and push out a revocation update instead of having to provision new keys)

But what do you want to sign? In the general purpose Linux world, we use an intermediate bootloader called Shim to bridge from the Microsoft signing authority to a distribution one. Shim then verifies the signature on grub, and grub in turn verifies the signature on the kernel. This is a large body of code that exists because of the use cases that general purpose distributions need to support - primarily, booting on arbitrary off the shelf hardware, and allowing arbitrary and complicated boot setups. This is unnecessary in the appliance case, where the hardware target can be well defined, where there's no need for interoperability with the Microsoft signing authority, and where the boot configuration can be extremely static.

We can skip all of this complexity using systemd-boot's unified Linux image support. This has the format described here, but the short version is that it's simply a kernel and initramfs linked into a small EFI executable that will run them. Instructions for generating such an image are here, and if you follow them you'll end up with a single static image that can be directly executed by the firmware. Signing this avoids dealing with a whole host of problems associated with relying on shim and grub, but note that you'll be embedding the initramfs as well. Again, this should be fine for appliance use-cases, but you'll need your build system to support building the initramfs at image creation time rather than relying on it being generated on the host.

At this point we have a single image that can be verified by the firmware and will get us to the point of a running kernel and initramfs. Unless you've got enough RAM that you can put your entire workload in the initramfs, you're going to want a filesystem as well, and you're going to want to verify that that filesystem hasn't been tampered with. The easiest approach to this is to use dm-verity, a device-mapper layer that uses a hash tree to verify that the filesystem contents haven't been modified. The kernel needs to know what the root hash is, so this can either be embedded into your initramfs image or into the kernel command line. Either way, it'll end up in the signed boot image, so nobody will be able to tamper with it.

It's important to note that a dm-verity partition is read-only - the kernel doesn't have the cryptographic secret that would be required to generate a new hash tree if the partition is modified. So if you require the ability to write data or logs anywhere, you'll need to add a new partition for that. If this partition is unencrypted, an attacker with access to the device will be able to put whatever they want on there. You should treat any data you read from there as untrusted, and ensure that it's validated before use (ie, don't just feed it to a random parser written in C and expect that everything's going to be ok). On the other hand, if it's encrypted, remember that you can't just put the encryption key in the boot image - an attacker with access to the device is going to be able to dump that and extract it. You'll probably want to use a TPM-sealed encryption secret, which will be discussed later on.

At this point everything in the boot process is cryptographically verified, and so should be difficult to tamper with. Unfortunately this isn't really sufficient - on x86 systems there's typically no verification of the integrity of the secure boot database. An attacker with physical access to the system could attach a programmer directly to the firmware flash and rewrite the secure boot database to include keys they control. They could then replace the boot image with one that they've signed, and the machine would happily boot code that the attacker controlled. We need to be able to demonstrate that the system booted using the correct secure boot keys, and the only way we can do that is to use the TPM.

I wrote an introduction to TPMs a while back. The important thing to know here is that the TPM contains a set of Platform Configuration Registers that are large enough to contain a cryptographic hash. During boot, each component of the boot process will generate a "measurement" of other security critical components, including the next component to be booted. These measurements are a representation of the data in question - they may simply be a hash of the object being measured, or the hash of a structure containing various pieces of metadata. Each measurement is passed to the TPM, along with the PCR it should be measured into. The TPM takes the new measurement, appends it to the existing value, and then stores the hash of this concatenated data in the PCR. This means that the final PCR value depends not only on the measurement, but also on every previous measurement. Without breaking the hash algorithm, there's no way to set the PCR to an arbitrary value. The hash values and some associated data are stored in a log that's kept in system RAM, which we'll come back to later.

Different PCRs store different pieces of information, but the one that's most interesting to us is PCR 7. Its use is documented in the TCG PC Client Platform Firmware Profile (section 3.3.4.8), but the short version is that the firmware will measure the secure boot keys that are used to boot the system. If the secure boot keys are altered (such as by an attacker flashing new ones), the PCR 7 value will change.

What can we do with this? There's a couple of choices. For devices that are online, we can perform remote attestation, a process where the device can provide a signed copy of the PCR values to another system. If the system also provides a copy of the TPM event log, the individual events in the log can be replayed in the same way that the TPM would use to calculate the PCR values, and then compared to the actual PCR values. If they match, that implies that the log values are correct, and we can then analyse individual log entries to make assumptions about system state. If a device has been tampered with, the PCR 7 values and associated log entries won't match the expected values, and we can detect the tampering.

If a device is offline, or if there's a need to permit local verification of the device state, we still have options. First, we can perform remote attestation to a local device. I demonstrated doing this over Bluetooth at LCA back in 2020. Alternatively, we can take advantage of other TPM features. TPMs can be configured to store secrets or keys in a way that renders them inaccessible unless a chosen set of PCRs have specific values. This is used in tpm2-totp, which uses a secret stored in the TPM to generate a TOTP value. If the same secret is enrolled in any standard TOTP app, the value generated by the machine can be compared to the value in the app. If they match, the PCR values the secret was sealed to are unmodified. If they don't, or if no numbers are generated at all, that demonstrates that PCR 7 is no longer the same value, and that the system has been tampered with.

Unfortunately, TOTP requires that both sides have possession of the same secret. This is fine when a user is making that association themselves, but works less well if you need some way to ship the secret on a machine and then separately ship the secret to a user. If the user can simply download the secret via some API, so can an attacker. If an attacker has the secret, they can modify the secure boot database and re-seal the secret to the new PCR 7 value. That means having to add some form of authentication, along with a strong binding of machine serial number to a user (in order to avoid someone with valid credentials simply downloading all the secrets).

Instead, we probably want some mechanism that uses asymmetric cryptography. A keypair can be generated on the TPM, which will refuse to release an unencrypted copy of the private key. The public key, however, can be exported and stored. If it's acceptable for a verification app to connect to the internet then the public key can simply be obtained that way - if not, a certificate can be issued to the key, and this exposed to the verifier via a QR code. The app then verifies that the certificate is signed by the vendor, and if so extracts the public key from that. The private key can have an associated policy that only permits its use when PCR 7 has an appropriate value, so the app then generates a nonce and asks the user to type that into the device. The device generates a signature over that nonce and displays that as a QR code. The app verifies the signature matches, and can then assert that PCR 7 has the expected value.

Once we can assert that PCR 7 has the expected value, we can assert that the system booted something signed by us and thus infer that the rest of the boot chain is also secure. But this is still dependent on the TPM obtaining trustworthy information, and unfortunately the bus that the TPM sits on isn't really terribly secure (TPM Genie is an example of an interposer for i2c-connected TPMs, but there's no reason an LPC one can't be constructed to attack the sort usually used on PCs). TPMs do support encrypted communication channels, but bootstrapping those isn't straightforward without firmware support. The easiest way around this is to make use of a firmware-based TPM, where the TPM is implemented in software running on an ancillary controller. Intel's solution is part of their Platform Trust Technology and runs on the Management Engine, AMD run it on the Platform Security Processor. In both cases it's not terribly feasible to intercept the communications, so we avoid this attack. The downside is that we're then placing more trust in components that are running much more code than a TPM would and which have a correspondingly larger attack surface. Which is preferable is going to depend on your threat model.

Most of this should be achievable using Yocto, which now has support for dm-verity built in. It's almost certainly going to be easier using this than trying to base on top of a general purpose distribution. I'd love to see this become a largely push button receive secure image process, so might take a go at that if I have some free time in the near future.

[1] Obviously technologies that can be used to ensure nobody other than me is able to modify the software on devices I own can also be used to ensure that nobody other than the manufacturer is able to modify the software on devices that they sell to third parties. There's no real technological solution to this problem, but we shouldn't allow the fact that a technology can be used in ways that are hostile to user freedom to cause us to reject that technology outright.
[2] This is slightly complicated due to the interactions with the Management Engine (on Intel) or the Platform Security Processor (on AMD). Here's a good writeup on the Intel side of things.

Flat | Top-Level Comments Only

This article has been mentioned on Hacker News --- https://news.ycombinator.com/item?id=27365057.

At the moment, the top-rated comment is one I really very much agree with:

"While I find this post and the ideas presented very interesting on the technical level, work in that direction ("remote attestation", making devices "tamper-proof") tends to give me a dystopian vibe - foreshadowing a world where there's no hardware left you can hack, build and flash your own firmware onto: Complete tivoization, to re-use lingo from when the GPLv3 was drafted. With that, really neutering all the benefits Free Software provides. What good is having all the source code in the world if I can never put my (or anyone else's) modifications to it into effect?"

Just a side note: tpm2-totp is not by Trammell Hudson but by Jonas and Andreas... ;-)

Whoops! Sorry, I have absolutely no idea how I had misremembered that.

1) A CPU without obvious & network accessible backdoors that were sold to governments and who knows who else.
2) Using an OS with reproducible builds
3) Using verified boot

Seems like enough. But no one seems willing to talk straight on #1. Conversations that strategize on this topic seem purposefully derailed all over the web. I realize it would be extremely hard to reverse engineer a CPU to verifiably disable its black box OS running inside. But there are many approaches including non-stop pushing AMD to release their source / release verifiable disabling tool.

you'll need your build system to support building the initramfs at image creation time

This is usually pretty easy, because the kernel build system itself can build an initramfs directly into the generated kernel image for you. See CONFIG_INITRAMFS_SOURCE. I build all my kernels this way, because if the initramfs is built into the kernel, and it contains enough rescue gear to recover a broken system, I never need to worry about my initramfs getting out of sync with the kernel or the root filesystem in such a way that I can't use some kernel built this way (perhaps quite an old one) to fix the system.

(I also use CONFIG_EFI_STUB so I don't have to worry about the nightmare of unfathomable complexity which is grub. The more complexity involved in booting, the more likely booting will fail, and I never want booting to fail -- this is also why secure boot is something I will never use on my own machines because its whole purpose is to force booting to fail, and any evil maid who can trigger a legitimate failure has also broken into my house and I have bigger problems. Other people will, of course, have different threat models and use cases, and for some of them secure boot might make sense.)

What stops a freshly flashed firmware from lying to the tpm? Or are the first writes done from ROM?

If you have Boot Guard enabled (for Intel, AMD have some analogue), the first code executed on the CPU is signed with keys fused into the CPU. This code is responsible for measuring the initial firmware code, which should then take responsibility for measuring the next block. As long as the signed code module (The Authenticated Code Module, or ACM, in Intel-speak) behaves appropriately, there should be no opportunity for untrusted firmware to run before being measured.

Does UEFI Secure Boot still allow booting from DVD or memory stick ?
If so, (how) does it indicate that such a boot has previously happened and that security may have been compromised ?

Or does booting from external media make all internal media read-only, so that I can still, say, scan for viruses (though not disable/remove malware) from the security of a known good system ?

It does, but the signature requirements are the same - the external media needs to be signed with one of the trusted signing keys.

I'll single out 2018-2019-2020-2021 Asus Vivobook 15s as my first example.

Out of the box, Secure Boot is enabled.

They have Microsoft's Secure Boot signing key, as well as Canonicals.

Booting ubuntu's grub from a flash drive starts up normally, no writes are blocked post kernel-boot, unless you've tinkered with the squashfs, which merely has a hash value 'protecting' it from inadvertent corruption in the download process. A message will be printed during boot if it doesn't match, suggesting you might "encounter errors". And only if the bootsplash is suppressed via the kernel commandline.

Each of these will result in different PCR values; so if the disk is encrypted, the keys to decrypt it will probably fail to be unsealed. This is generally not a problem in my case, as I'm prepping the machine's components for resale, not sniffing for secrets.

There is no indication via EFI Event logs that a boot from alternative media has been attempted or succeeded, that I have been able to note.

This also does not require the device to be a USB Mass Storage Class hard disk / flash disk, so long as the ISO image has the signed grub in the right place, it will boot from optical media, or virtual optical media.

As another example, I commonly breach secure-boot enabled linux appliances based on supermicro motherboards and LSI Logic storage controllers, clearing out vendor firmware back to clean supermicro firmwares in the process.

Again, since I'm not after the data itself, I am generally able to operate unimpeded; convince the firmware to launch freedos via hook or crook (usually these appliances have grub bootloaders and the menu can be rewritten to launch memdisk with a harddisk image hooked at :80.) and run the good ol' AMI firmware flash tools, which will put the ME into recovery mode and stuff some EFI update capsules down it's throat.

A quick ATA Secure Erase cycle later; and the off-lease equipment is successfully debranded and ready to go back out in the wild.

Producing a trustworthy x86-based Linux appliance

no subject

tpm2-totp

Re: tpm2-totp

no subject

Building the initramfs at image creation time

no subject

no subject

no subject

no subject

no subject