[personal profile] mjg59
Anyone who's been following anything I've written lately may be under the impression that I dislike EFI. They'd be entirely correct. It's an awful thing and I've lost far too much of my life to it. It complicates the process of booting for no real benefit to the OS. The only real advantage we've seen so far is that we can configure boot devices in a vaguely vendor-neutral manner without having to care about BIOS drive numbers. Woo.

But there is something else EFI gives us. We finally have more than 256 bytes of nvram available to us as standard. Enough nvram, in fact, for us to reasonably store crash output. Progress!

This isn't a novel concept. The UEFI spec provides for a specially segregated are of nvram for hardware error reports. This is lovely and not overly helpful for us, because they're supposed to be in a well-defined format that doesn't leave much scope for "I found a null pointer where I would really have preferred there not be one" followed by a pile of text, especially if the firmware's supposed to do something with it. Also, the record format has lots of metadata that I really don't care about. Apple have also been using EFI for this, creating a special variable that stores the crash data and letting them get away with just telling the user to turn their computer off and then turn it back on again.

EFI's not the only way this could be done, either. ACPI specifies something called the ERST, or Error Record Serialization Table. The OS can stick errors in here and then they can be retrieved later. Excellent! Except ERST is currently usually only present on high-end servers. But when ERST support was added to Linux, a generic interface called pstore went in as well.

Pstore's very simple. It's a virtual filesystem that has platform-specific plugins. The platform driver (such as ERST) registers with pstore and the ERST errors then get exposed as files in pstore. Deleting the files removes the records. pstore also registers with kmsg_dump, so when an oops happens the kernel output gets dumped back into a series of records. I'd been playing with pstore but really wanted something a little more convenient than an 8-socket server to test it with, so ended up writing a pstore backend that uses EFI variables. And now whenever I crash the kernel, pstore gives me a backtrace without me having to take photographs of the screen. Progress.

Patches are here. I should probably apologise to Seiji Aguchi, who was working on the same problem and posted a preliminary patch for some feedback last month. I replied to the thread without ever reading the patch and then promptly forgot about it, leading to me writing it all from scratch last week. Oops.

(There's an easter egg in the patchset. First person to find it doesn't win a prize. Sorry.)

Date: 2011-06-07 09:47 pm (UTC)
From: [identity profile] anholt.livejournal.com
The easter egg does not appear to involve cowsaying the previous boot's oops log on next boot. I lost interest.

Date: 2011-06-07 09:52 pm (UTC)
From: [identity profile] notting.id.fedoraproject.org
Well, it's not quite the sad mac, but it will do.


Date: 2011-06-08 06:45 am (UTC)
From: (Anonymous)
Does the numeric value of LINUX_EFI_CRASH_GUID have some significance? Is that the easter egg?

Not getting it.

Date: 2011-06-08 07:54 pm (UTC)
From: (Anonymous)
I still honestly don't understand why (U)EFI was created. Something minimal like coreboot would surely have been better. Maybe coreboot didn't exist in its actual design when Intel started developing EFI (IIRC it was about stuffing Linux in the Flashrom back then), but surely they knew about openfirmware?

Was there anything so horribly wrong with OF that they needed to recreate it?

Re: Not getting it.

Date: 2011-06-08 08:57 pm (UTC)
From: [identity profile] ajaxxx.livejournal.com
Was there anything so horribly wrong with OF that they needed to recreate it?

I actually had an Intel EFI architect tell me - to my face - that they invented EFI because there wasn't anything that already existed to reuse. Later, in the same talk, he said he'd seen EFI implementations that wrap OpenFirmware.

The conclusion I drew from this was that the thing wrong with OpenFirmware was that Intel hadn't written it; either that someone needed to justify their existence, or that they have some deeper cultural NIH reason why they had to not use OF. So, either malice or incompetence, and I don't really care which. It's a complete clusterfuck.

Re: Not getting it.

Date: 2011-06-09 06:51 am (UTC)
From: [identity profile] yuhong.wordpress.com
FYI, see this:

easter egg

Date: 2011-09-20 07:28 pm (UTC)
From: (Anonymous)
Is it 😿☠ ??


Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Google. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Expand Cut Tags

No cut tags