Firmware bugs considered enraging
Jan. 6th, 2012 02:39 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Part of our work to make it possible to use UEFI Secure Boot on Linux has been to improve our EFI variable support code. Right now this has a hardcoded assumption that variables are 1024 bytes or smaller, which was true in pre-1.0 versions of the EFI spec. Modern implementations allow the maximum variable size to be determined by the hardware, and with implementations using large key sizes and hashes 1024 bytes isn't really going to cut it. My first attempt at this was a little ugly but also fell foul of the fact that sysfs only allows writes of up to the size of a page - so 4KB on most of the platforms we're caring about. So I've now reimplemented it as a filesystem[1], which is trickier but avoids this problem nicely.
Things were almost working fine - I could read variables of arbitrary sizes, and I could write to existing variables. I was just finishing hooking up new variable creation, but in the process accidentally set the contents of the Boot0002 variable to 0xffffffff 0xffffffff 0x00000000. Boot* variables provide the UEFI firmware with the different configured boot devices on the system - they can point either at a raw device or at a bootloader on a device, and they can do so using various different namespaces. They have a defined format, as documented in chapter 9 of the UEFI spec. At boot time the boot manager reads the variables and attempts to boot from them in a configured order as found in the BootOrder variable.
Now, obviously, 0xffffffff 0x00000000 is unlikely to conform to the specification. And when I rebooted the machine, it gave me a flashing cursor and did nothing. Fair enough - I should be able to choose another boot path from the boot manager. Except the boot manager behaves identically, and I get a flashing cursor and nothing else.
I reported this to the EDK2 development list, and Andrew Fish (who invented EFI back in the 90s) pointed me at the code that's probably responsible. It's in the BDS (Boot Device Selection) library that's part of the UEFI reference implementation from Intel, and you can find it here. The relevant function is BdsLibVariableToOption, which is as follows (with irrelevant bits elided):
So now I have a machine that won't boot, and the clear CMOS jumper doesn't clear the flash contents so I have no idea how to recover from it. And because this code is present in the Intel reference implementation, doing the same thing on most other UEFI machines would probably have the same outcome. Thankfully, it's not something people are likely to do by accident - using any of the standard interfaces will always generate a valid entry, so you could only trigger this when modifying variables by hand. But now I need to set up another test machine.
[1] All code in Linux will evolve until the point where it's implemented as a filesystem.
Things were almost working fine - I could read variables of arbitrary sizes, and I could write to existing variables. I was just finishing hooking up new variable creation, but in the process accidentally set the contents of the Boot0002 variable to 0xffffffff 0xffffffff 0x00000000. Boot* variables provide the UEFI firmware with the different configured boot devices on the system - they can point either at a raw device or at a bootloader on a device, and they can do so using various different namespaces. They have a defined format, as documented in chapter 9 of the UEFI spec. At boot time the boot manager reads the variables and attempts to boot from them in a configured order as found in the BootOrder variable.
Now, obviously, 0xffffffff 0x00000000 is unlikely to conform to the specification. And when I rebooted the machine, it gave me a flashing cursor and did nothing. Fair enough - I should be able to choose another boot path from the boot manager. Except the boot manager behaves identically, and I get a flashing cursor and nothing else.
I reported this to the EDK2 development list, and Andrew Fish (who invented EFI back in the 90s) pointed me at the code that's probably responsible. It's in the BDS (Boot Device Selection) library that's part of the UEFI reference implementation from Intel, and you can find it here. The relevant function is BdsLibVariableToOption, which is as follows (with irrelevant bits elided):
BdsLibVariableToOption ( IN OUT LIST_ENTRY *BdsCommonOptionList, IN CHAR16 *VariableName ) { UINT16 FilePathSize; UINT8 *Variable; UINT8 *TempPtr; UINTN VariableSize; VOID *LoadOptions; UINT32 LoadOptionsSize; CHAR16 *Description; // // Read the variable. We will never free this data. // Variable = BdsLibGetVariableAndSize ( VariableName, &gEfiGlobalVariableGuid, &VariableSize ); if (Variable == NULL) { return NULL; }So so far so good - we read the variable from flash and put it in Variable, Variable is now 0xffffffff 0xffffffff 0x00000000. If it hadn't existed we'd have skipped over and continued. VariableSize is 12.
// // Get the option attribute // TempPtr = Variable; Attribute = *(UINT32 *) Variable; TempPtr += sizeof (UINT32);Attribute is now 0xffffffff and TempPtr points to Variable + 4.
// // Get the option's device path size // FilePathSize = *(UINT16 *) TempPtr; TempPtr += sizeof (UINT16);FilePathSize is 0xffff, TempPtr points to Variable + 6.
// // Get the option's description string size // TempPtr += StrSize ((CHAR16 *) TempPtr);TempPtr points to 0xffff 0x0000, so StrSize (which is basically strlen) will be 4. TempPtr now points to Variable + 10.
// // Get the option's device path // DevicePath = (EFI_DEVICE_PATH_PROTOCOL *) TempPtr; TempPtr += FilePathSize;TempPtr now points to Variable + 65545 (FilePathSize is 0xffff).
LoadOptions = TempPtr; LoadOptionsSize = (UINT32) (VariableSize - (UINTN) (TempPtr - Variable));LoadOptionsSize is now 12 - (Variable + 65545 - Variable), or 12 - 65545, or -65533. But it's cast to an unsigned 32 bit integer, so it's actually 4294901763.
Option->LoadOptions = AllocateZeroPool (LoadOptionsSize); ASSERT(Option->LoadOptions != NULL);We attempt to allocate just under 4GB of RAM. This probably fails - if it does the boot manager exits. This probably means game over. But if it somehow succeeds:
CopyMem (Option->LoadOptions, LoadOptions, LoadOptionsSize);we then proceed to read almost 4GB of content from uninitialised addresses, and since Variable was probably allocated below 4GB that almost certainly includes all of your PCI space (which is typically still below 4GB) and bits of your hardware probably generate very unhappy signals on the bus and you lose anyway.
So now I have a machine that won't boot, and the clear CMOS jumper doesn't clear the flash contents so I have no idea how to recover from it. And because this code is present in the Intel reference implementation, doing the same thing on most other UEFI machines would probably have the same outcome. Thankfully, it's not something people are likely to do by accident - using any of the standard interfaces will always generate a valid entry, so you could only trigger this when modifying variables by hand. But now I need to set up another test machine.
[1] All code in Linux will evolve until the point where it's implemented as a filesystem.
Hotflashing
Date: 2012-01-06 09:28 pm (UTC)flashrom
Date: 2012-01-06 11:38 pm (UTC)hopefully, your board is supported.
Re: flashrom
Date: 2012-01-08 09:56 pm (UTC)matthew: you did not mentioned which board was used so i cant really say what the best option is.
it is probably a 8-legged SOIC or DIP chip. if the bios chip is socketed then rescuing is pretty easy. you can either hook up an external programmer like the bus pirate already mentioned (although that one is pretty slow for this stuff of work) or try to hot flash it in another good board. if it is soldered you can either unsolder it (not that easy without practice) or try in-system programming with an external flasher. this does not work with all board though see also http://flashrom.org/ISP
after making the connection you can either try to find the variable in the image and change it there before flashing it back, or just use a complete image from another board or the vendor page.
if you need further help please send a mail to our list or visit #flashrom on freenode.
Re: flashrom
Date: 2012-01-30 09:45 pm (UTC)Daniel
Date: 2012-01-07 05:46 am (UTC)Ah yes the UNIX philosophy: everything as a filesystem.
Try a BusPirate
Date: 2012-01-07 10:44 am (UTC)See http://dangerousprototypes.com/tag/bios/ for some examples.
Shortcircuiting?
Date: 2012-01-07 06:10 pm (UTC)Re: Shortcircuiting?
Date: 2012-01-07 06:34 pm (UTC)Reproducibility
Date: 2012-01-09 11:27 pm (UTC)Re: Reproducibility
Date: 2012-01-18 04:37 pm (UTC)BIOS
Date: 2012-01-10 05:03 pm (UTC)Re: BIOS
Date: 2012-01-10 05:09 pm (UTC)fixed?
Date: 2012-01-14 05:52 pm (UTC)Re: fixed?
Date: 2012-01-14 08:05 pm (UTC)uefi variable support
Date: 2012-01-14 05:57 pm (UTC)Why China again???
Date: 2012-01-18 09:12 am (UTC)The same way, I always see people talking about the great firewall of China. While this is an evil thing, some firewall are a lot more hard to go around, like in Iran (remember: ssh is forbidden there), North Korea (do I need to explain?) or Burma.
So next time, please take care with such examples.