[personal profile] mjg59
Part of our work to make it possible to use UEFI Secure Boot on Linux has been to improve our EFI variable support code. Right now this has a hardcoded assumption that variables are 1024 bytes or smaller, which was true in pre-1.0 versions of the EFI spec. Modern implementations allow the maximum variable size to be determined by the hardware, and with implementations using large key sizes and hashes 1024 bytes isn't really going to cut it. My first attempt at this was a little ugly but also fell foul of the fact that sysfs only allows writes of up to the size of a page - so 4KB on most of the platforms we're caring about. So I've now reimplemented it as a filesystem[1], which is trickier but avoids this problem nicely.

Things were almost working fine - I could read variables of arbitrary sizes, and I could write to existing variables. I was just finishing hooking up new variable creation, but in the process accidentally set the contents of the Boot0002 variable to 0xffffffff 0xffffffff 0x00000000. Boot* variables provide the UEFI firmware with the different configured boot devices on the system - they can point either at a raw device or at a bootloader on a device, and they can do so using various different namespaces. They have a defined format, as documented in chapter 9 of the UEFI spec. At boot time the boot manager reads the variables and attempts to boot from them in a configured order as found in the BootOrder variable.

Now, obviously, 0xffffffff 0x00000000 is unlikely to conform to the specification. And when I rebooted the machine, it gave me a flashing cursor and did nothing. Fair enough - I should be able to choose another boot path from the boot manager. Except the boot manager behaves identically, and I get a flashing cursor and nothing else.

I reported this to the EDK2 development list, and Andrew Fish (who invented EFI back in the 90s) pointed me at the code that's probably responsible. It's in the BDS (Boot Device Selection) library that's part of the UEFI reference implementation from Intel, and you can find it here. The relevant function is BdsLibVariableToOption, which is as follows (with irrelevant bits elided):
BdsLibVariableToOption (
  IN OUT LIST_ENTRY                   *BdsCommonOptionList,
  IN  CHAR16                          *VariableName
  )
{
  UINT16                    FilePathSize;
  UINT8                     *Variable;
  UINT8                     *TempPtr;
  UINTN                     VariableSize;
  VOID                      *LoadOptions;
  UINT32                    LoadOptionsSize;
  CHAR16                    *Description;

  //
  // Read the variable. We will never free this data.
  //
  Variable = BdsLibGetVariableAndSize (
              VariableName,
              &gEfiGlobalVariableGuid,
              &VariableSize
              );
  if (Variable == NULL) {
    return NULL;
  }
So so far so good - we read the variable from flash and put it in Variable, Variable is now 0xffffffff 0xffffffff 0x00000000. If it hadn't existed we'd have skipped over and continued. VariableSize is 12.
  //
  // Get the option attribute
  //
  TempPtr   =  Variable;
  Attribute =  *(UINT32 *) Variable;
  TempPtr   += sizeof (UINT32);
Attribute is now 0xffffffff and TempPtr points to Variable + 4.
  //
  // Get the option's device path size
  //
  FilePathSize =  *(UINT16 *) TempPtr;
  TempPtr      += sizeof (UINT16);
FilePathSize is 0xffff, TempPtr points to Variable + 6.
  //
  // Get the option's description string size
  //
  TempPtr     += StrSize ((CHAR16 *) TempPtr);
TempPtr points to 0xffff 0x0000, so StrSize (which is basically strlen) will be 4. TempPtr now points to Variable + 10.
  //
  // Get the option's device path
  //
  DevicePath =  (EFI_DEVICE_PATH_PROTOCOL *) TempPtr;
  TempPtr    += FilePathSize;
TempPtr now points to Variable + 65545 (FilePathSize is 0xffff).
  LoadOptions     = TempPtr;
  LoadOptionsSize = (UINT32) (VariableSize - (UINTN) (TempPtr - Variable));
LoadOptionsSize is now 12 - (Variable + 65545 - Variable), or 12 - 65545, or -65533. But it's cast to an unsigned 32 bit integer, so it's actually 4294901763.
  Option->LoadOptions = AllocateZeroPool (LoadOptionsSize);
  ASSERT(Option->LoadOptions != NULL);
We attempt to allocate just under 4GB of RAM. This probably fails - if it does the boot manager exits. This probably means game over. But if it somehow succeeds:
CopyMem (Option->LoadOptions, LoadOptions, LoadOptionsSize);
we then proceed to read almost 4GB of content from uninitialised addresses, and since Variable was probably allocated below 4GB that almost certainly includes all of your PCI space (which is typically still below 4GB) and bits of your hardware probably generate very unhappy signals on the bus and you lose anyway.

So now I have a machine that won't boot, and the clear CMOS jumper doesn't clear the flash contents so I have no idea how to recover from it. And because this code is present in the Intel reference implementation, doing the same thing on most other UEFI machines would probably have the same outcome. Thankfully, it's not something people are likely to do by accident - using any of the standard interfaces will always generate a valid entry, so you could only trigger this when modifying variables by hand. But now I need to set up another test machine.

[1] All code in Linux will evolve until the point where it's implemented as a filesystem.

Hotflashing

Date: 2012-01-06 09:28 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawkG0O2USa6XI5q10ytNBCdrZskHrY2IlHk
If you have a second (compatible) machine, you could try hotflashing the BIOS chip (see http://www.overclock.net/t/102206/how-to-hotflash-your-bios-chip).

flashrom

Date: 2012-01-06 11:38 pm (UTC)
From: [personal profile] lewurm
http://flashrom.org/Flashrom is your friend (and a second flash chip for your mainboard from ebay ;-))

hopefully, your board is supported.

Re: flashrom

Date: 2012-01-08 09:56 pm (UTC)
From: (Anonymous)
i am biased because i am one of the developers but i agree :)

matthew: you did not mentioned which board was used so i cant really say what the best option is.
it is probably a 8-legged SOIC or DIP chip. if the bios chip is socketed then rescuing is pretty easy. you can either hook up an external programmer like the bus pirate already mentioned (although that one is pretty slow for this stuff of work) or try to hot flash it in another good board. if it is soldered you can either unsolder it (not that easy without practice) or try in-system programming with an external flasher. this does not work with all board though see also http://flashrom.org/ISP

after making the connection you can either try to find the variable in the image and change it there before flashing it back, or just use a complete image from another board or the vendor page.

if you need further help please send a mail to our list or visit #flashrom on freenode.

Re: flashrom

Date: 2012-01-30 09:45 pm (UTC)
From: [identity profile] osdevnotes.blogspot.com
I have "emotional attachment" to Tiano as well, but it's stuff like this that makes me mentally palm-face every time anyone mentions UEFI. Internal errors should not cause unrecoverable state, and most code is written without thinking about the global scope.

Daniel

Date: 2012-01-07 05:46 am (UTC)
From: (Anonymous)
"[1] All code in Linux will evolve until the point where it's implemented as a filesystem."

Ah yes the UNIX philosophy: everything as a filesystem.

Try a BusPirate

Date: 2012-01-07 10:44 am (UTC)
From: (Anonymous)
You could try using a BusPirate to reflash your chip.

See http://dangerousprototypes.com/tag/bios/ for some examples.

Shortcircuiting?

Date: 2012-01-07 06:10 pm (UTC)
From: (Anonymous)
Could you boot if you short circuit pins on the flash chip during booting? That recipe helped on a bricked WRT54G ages ago.

Reproducibility

Date: 2012-01-09 11:27 pm (UTC)
From: [identity profile] int-ua.blogspot.com
Are these variables writeable for the OS? Couldn't such method be used to brick motherboards intentionally?

Re: Reproducibility

Date: 2012-01-18 04:37 pm (UTC)
From: (Anonymous)
it's already possible to do this; the danger with this example is that it'll "work" (not error) on any UEFI board, probably succeed in stopping booting on most of them, result in the same unrecoverable brick-ness in most of _THEM_, and for the small proportion of motherboards where recovery _is_ possible, most people won't know how to

BIOS

Date: 2012-01-10 05:03 pm (UTC)
From: (Anonymous)
Does your system setup allow you to switch to BIOS boot mode to recover?

fixed?

Date: 2012-01-14 05:52 pm (UTC)
From: [personal profile] the_ridikulus_rat
I have been following your uefi related posts and play around with tianocore duetpkg uefi firmware. I noticed this commit http://tianocore.git.sourceforge.net/git/gitweb.cgi?p=tianocore/edk2;a=commitdiff;h=b16cc38bf3d74e7b781022aad96f28b3f8507fdd;hp=f79fa76e9ccebb8cb680455e034f63035ae44412 . Does this fix the issue you are talking about. I am not a c programmer (I only know bash) to understand the source very well.

uefi variable support

Date: 2012-01-14 05:57 pm (UTC)
From: [personal profile] the_ridikulus_rat
Regarding your recent patches for uefi variable support http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/4105f2e6a51059e1/b50f974892ee0d2a?show_docid=b50f974892ee0d2a&pli=1 , is there any public git repo where you store these changes? The userland tools that use these that I can think off are efibootmgr http://linux.dell.com/cgi-bin/gitweb/gitweb.cgi?p=efibootmgr.git;a=summary , uefivars http://uefivars.git.sourceforge.net/git/gitweb.cgi?p=uefivars/uefivars;a=summary and Ubuntu's Firmware Test Suite "fwts uefidump" command http://kernel.ubuntu.com/git?p=cking/fwts/.git

Why China again???

Date: 2012-01-18 09:12 am (UTC)
From: (Anonymous)
Again, somebody talks about China as the most evil. In fact, this would be quite hard for a Chinese citizen to do credit card hacks, since the bank system is quite locked (banks are all government owned). China may be evil, but it's far from being the country where you see the most credit card thefts. You'd be more looking at east-Europe, Brazil, or Russia here.

The same way, I always see people talking about the great firewall of China. While this is an evil thing, some firewall are a lot more hard to go around, like in Iran (remember: ssh is forbidden there), North Korea (do I need to explain?) or Burma.

So next time, please take care with such examples.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Expand Cut Tags

No cut tags