[personal profile] mjg59
Part of our work to make it possible to use UEFI Secure Boot on Linux has been to improve our EFI variable support code. Right now this has a hardcoded assumption that variables are 1024 bytes or smaller, which was true in pre-1.0 versions of the EFI spec. Modern implementations allow the maximum variable size to be determined by the hardware, and with implementations using large key sizes and hashes 1024 bytes isn't really going to cut it. My first attempt at this was a little ugly but also fell foul of the fact that sysfs only allows writes of up to the size of a page - so 4KB on most of the platforms we're caring about. So I've now reimplemented it as a filesystem[1], which is trickier but avoids this problem nicely.

Things were almost working fine - I could read variables of arbitrary sizes, and I could write to existing variables. I was just finishing hooking up new variable creation, but in the process accidentally set the contents of the Boot0002 variable to 0xffffffff 0xffffffff 0x00000000. Boot* variables provide the UEFI firmware with the different configured boot devices on the system - they can point either at a raw device or at a bootloader on a device, and they can do so using various different namespaces. They have a defined format, as documented in chapter 9 of the UEFI spec. At boot time the boot manager reads the variables and attempts to boot from them in a configured order as found in the BootOrder variable.

Now, obviously, 0xffffffff 0x00000000 is unlikely to conform to the specification. And when I rebooted the machine, it gave me a flashing cursor and did nothing. Fair enough - I should be able to choose another boot path from the boot manager. Except the boot manager behaves identically, and I get a flashing cursor and nothing else.

I reported this to the EDK2 development list, and Andrew Fish (who invented EFI back in the 90s) pointed me at the code that's probably responsible. It's in the BDS (Boot Device Selection) library that's part of the UEFI reference implementation from Intel, and you can find it here. The relevant function is BdsLibVariableToOption, which is as follows (with irrelevant bits elided):
BdsLibVariableToOption (
  IN OUT LIST_ENTRY                   *BdsCommonOptionList,
  IN  CHAR16                          *VariableName
  )
{
  UINT16                    FilePathSize;
  UINT8                     *Variable;
  UINT8                     *TempPtr;
  UINTN                     VariableSize;
  VOID                      *LoadOptions;
  UINT32                    LoadOptionsSize;
  CHAR16                    *Description;

  //
  // Read the variable. We will never free this data.
  //
  Variable = BdsLibGetVariableAndSize (
              VariableName,
              &gEfiGlobalVariableGuid,
              &VariableSize
              );
  if (Variable == NULL) {
    return NULL;
  }
So so far so good - we read the variable from flash and put it in Variable, Variable is now 0xffffffff 0xffffffff 0x00000000. If it hadn't existed we'd have skipped over and continued. VariableSize is 12.
  //
  // Get the option attribute
  //
  TempPtr   =  Variable;
  Attribute =  *(UINT32 *) Variable;
  TempPtr   += sizeof (UINT32);
Attribute is now 0xffffffff and TempPtr points to Variable + 4.
  //
  // Get the option's device path size
  //
  FilePathSize =  *(UINT16 *) TempPtr;
  TempPtr      += sizeof (UINT16);
FilePathSize is 0xffff, TempPtr points to Variable + 6.
  //
  // Get the option's description string size
  //
  TempPtr     += StrSize ((CHAR16 *) TempPtr);
TempPtr points to 0xffff 0x0000, so StrSize (which is basically strlen) will be 4. TempPtr now points to Variable + 10.
  //
  // Get the option's device path
  //
  DevicePath =  (EFI_DEVICE_PATH_PROTOCOL *) TempPtr;
  TempPtr    += FilePathSize;
TempPtr now points to Variable + 65545 (FilePathSize is 0xffff).
  LoadOptions     = TempPtr;
  LoadOptionsSize = (UINT32) (VariableSize - (UINTN) (TempPtr - Variable));
LoadOptionsSize is now 12 - (Variable + 65545 - Variable), or 12 - 65545, or -65533. But it's cast to an unsigned 32 bit integer, so it's actually 4294901763.
  Option->LoadOptions = AllocateZeroPool (LoadOptionsSize);
  ASSERT(Option->LoadOptions != NULL);
We attempt to allocate just under 4GB of RAM. This probably fails - if it does the boot manager exits. This probably means game over. But if it somehow succeeds:
CopyMem (Option->LoadOptions, LoadOptions, LoadOptionsSize);
we then proceed to read almost 4GB of content from uninitialised addresses, and since Variable was probably allocated below 4GB that almost certainly includes all of your PCI space (which is typically still below 4GB) and bits of your hardware probably generate very unhappy signals on the bus and you lose anyway.

So now I have a machine that won't boot, and the clear CMOS jumper doesn't clear the flash contents so I have no idea how to recover from it. And because this code is present in the Intel reference implementation, doing the same thing on most other UEFI machines would probably have the same outcome. Thankfully, it's not something people are likely to do by accident - using any of the standard interfaces will always generate a valid entry, so you could only trigger this when modifying variables by hand. But now I need to set up another test machine.

[1] All code in Linux will evolve until the point where it's implemented as a filesystem.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Page Summary

Expand Cut Tags

No cut tags