[personal profile] mjg59
Some time back I wrote up a description of my proposed (and implemented) solution for making hibernation work under Linux even within the bounds of the integrity model. It's been a while, so here's an update.

The first is that localities just aren't an option. It turns out that they're optional in the spec, and TPMs are entirely permitted to say they don't support them. The only time they're likely to work is on platforms that support DRTM implementations like TXT. Most consumer hardware doesn't fall into that category, so we don't get to use that solution. Unfortunate, but, well.

The second is that I'd ignored an attack vector. If the kernel is configured to restrict access to PCR 23, then yes, an attacker is never able to modify PCR 23 to be in the same state it would be if hibernation were occurring and the key certification data will fail to validate. Unfortunately, an attacker could simply boot into an older kernel that didn't implement the PCR 23 restriction, and could fake things up there (yes, this is getting a bit convoluted, but the entire point here is to make this impossible rather than just awkward). Once PCR 23 was in the correct state, they would then be able to write out a new swap image, boot into a new kernel that supported the secure hibernation solution, and have that resume successfully in the (incorrect) belief that the image was written out in a secure environment.

This felt like an awkward problem to fix. We need to be able to distinguish between the kernel having modified the PCRs and userland having modified the PCRs, and we need to be able to do this without modifying any kernels that have already been released[1]. The normal approach to determining whether an event occurred in a specific phase of the boot process is to "cap" the PCR - extend it with a known value that indicates a transition between stages of the boot process. Any events that occur before the cap event must have occurred in the previous stage of boot, and since the final PCR value depends on the order of measurements and not just the contents of those measurements, if a PCR is capped before userland runs, userland can't fake the same PCR value afterwards. If Linux capped a PCR before userland started running, we'd be able to place a measurement there before the cap occurred and then prove that that extension occurred before userland had the opportunity to interfere. We could simply place a statement that the kernel supported the PCR 23 restrictions there, and we'd be fine.

Unfortunately Linux doesn't currently do this, and adding support for doing so doesn't fix the problem - if an attacker boots a kernel that doesn't cap a PCR, they can just cap it themselves from userland. So, we're faced with the same problem: booting an older kernel allows the system to be placed in an identical state to the current kernel, and a fake hibernation image can be written out. Solving this required a PCR that was being modified after kernel code was running, but before userland was started, even with existing kernels.

Thankfully, there is one! PCR 5 is defined as containing measurements related to boot management configuration and data. One of the measurements it contains is the result of the UEFI ExitBootServices() call. ExitBootServices() is called at the transition from the UEFI boot environment to the running OS, and the kernel contains code that executes before it. So, if we measure an assertion regarding whether or not we support restricted access to PCR 23 into PCR 5 before we call ExitBootServices(), this will prevent userspace from spoofing us (because userspace will only be able to extend PCR 5 after the firmware extended PCR 5 in response to ExitBootServices() being called). Obviously this depends on the firmware actually performing the PCR 5 extension when ExitBootServices() is called, but if firmware's out of spec then I don't think there's any real expectation of it being secure enough for any of this to buy you anything anyway.

My current tree is here, but there's a couple of things I want to do before submitting it, including ensuring that the key material is wiped from RAM after use (otherwise it could potentially be scraped out and used to generate another image afterwards) and, uh, actually making sure this works (I no longer have the machine I was previously using for testing, and switching my other dev machine over to TPM 2 firmware is proving troublesome, so I need to pull another machine out of the stack and reimage it).

[1] The linear nature of time makes feature development much more frustrating

Date: 2021-12-31 12:06 pm (UTC)
bens_dad: (Default)
From: [personal profile] bens_dad
[1] The linear nature of time makes feature development much more frustrating

I don't envy you if you are trying to protect against time-machines.

([personal profile] fanf did think about protection against attacks by ntp time-servers, but that should be easier.)

encrypted and authenticated

Date: 2022-01-01 01:06 am (UTC)
From: (Anonymous)
Does the use of TPM to seal a key adequately address the authenticity of the hibernation image? I'm aware of prior effort https://lkml.org/lkml/2019/7/10/601 in which it was going to be authenticated by HMAC. It was suggested elsewhere https://lkml.org/lkml/2019/1/8/1280 to use AES-GCM. -cmurf

Date: 2022-01-01 02:44 am (UTC)
From: (Anonymous)
What stops an attacker from simply booting a modified kernel that does what old and new kernels don't (and then kexec-ing the new kernel they're trying to fool)?
From: [personal profile] iplayzed
Hi Matthew!

I love the idea of lockdown, so I tried implementing it in my workflow, but there is a huge oversight in usability I think.

I think at least until the properly signed and TPM unlocked hibernation image is correctly used by the system, there should be an additional

lockdown=condifentiality/integrity,allowhibernate

additional option to lockdown kernel parameter option to enable all the other lockdown features.

Or maybe specify all features in a blacklist whitelist manner, so for specific use cases the administrator can decide if he wants to block ACPI error injection, but enable PCIE resize BAR for gaming, you get it. ;)

Right now you either use lockdown to it's fullest in my understanding (either in integrity or confidentiality mode) and don't hibernate or not use it at all. This is a waste of features if you need hibernation, because by using an encrypted swap you can reasonably defend against multiple exploits, even if there is still some way in the case of a fully compromised root to overwrite it:(https://mjg59.dreamwidth.org/55845.html?thread=2138917#cmt2138917)
I completely agree with the commenter, that this should be something the sysadmin/OEM decides, not the kernel setting it in stone.

In my humble opinion a better way would be to add a mode where all features of lockdown are enabled, at the very least if the swap partition itself is encrypted.
If you look at the manpage, there are several runtime interface restrictions.

For instance my security model is that I use custom secure boot keys. I tie the swap encryption to this among others. This means that only my trusted signed binaries can be launched on my system, so for me at the very least unless my root is compromised as-is, no one can overwrite the swap partition in a way that my EFI binary would decrypt it, as only I have the private key which I used to set up secure boot. As I take a good amount of precautions to not root-execute willy-nilly stuff, I can confidently say that not being able to use other features of lockdown is a waste of hardening possibilities.

In this way (I am picking some randomly from the feature of lockdown) I can protect myself exploits of writing /dev/mem or ACPI tables, where the attacker doesn't get access to full root access, and that point, overwriting the hibernation image is the very least of my concers, right :)?.

Thus, a huge amount of attack vectors are blocked, even if in the case of full root access, the overwrite problem still is possible for the swap. My point is that this would be a great compromise, as the most prelevant form of swap, so kernel image overwrite would be an evil maid attack, from a third party, on a laptop for instance.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Expand Cut Tags

No cut tags