One of the earlier examples of students using MIT computer resources to lay the groundwork for a later commercial endeavour, Zork was originally written in a LISP derivative called MDL. This was later tuned into the Zork Implementation Language, a domain specific language that was compiled to target the Z-machine rather than a specific piece of hardware. Combined with machine-specific Z-machine interpreters, this allowed rapid porting of games to a wide range of platforms - the only thing that needed to be rewritten was the interpreter, and that could be reused for any future games running on the same hardware.

Infocom were eventually acquired and killed off, but fan interest in their games continued. New Z-machine interpreters were written in order to allow their games (including Zork) to be run on platforms that Infocom had never targetted. One of the best known is Frotz. This has the advantage of being (a) portable and (b) including a "dumb" UI that makes no assumptions about the availablity of any vaguely useful functionality. Like, say, a Curses library.

So, Frotz seemed like the natural choice when this happened. But despite having a set of functionality that makes it look much more like an OS than a boot environment, UEFI doesn't actually expose a standard C library. The EFI Application Development Kit solves this particular design decision. Porting Frotz ended up involving far more fixing up of Frotz bugs that tripped up -Werror than anything else. One note, though - make sure you include DevShell in the list of required packages at build time, otherwise file i/o will mysteriously fail.

The tying of file i/o to the shell protocol unfortunately means that Frotz can't be directly launched by the firmware. The Boot to Zork images therefore contain a UEFI shell in the standard boot location (\EFI\BOOT\BOOTX64.EFI) which is executed when the firmware attempts to boot the device. The shell then looks for a file called "startup.nsh" in the root directory of the boot device and executes it. Unfortunately this doesn't actually set the shell equivalent of the current device, and so just launching Frotz from startup.nsh fails when Frotz can't open the Zork data file. The solution for this is simple, if ugly - the script walks through the list of devices, looking for one that contains ZORK1.DAT in the root directory. It then changes to that device and launches Frotz. If Frotz exits, it then resets the system.

This could be avoided by doing some more work and turning Frotz into a more UEFI-native application. Teaching Frotz to make native UEFI calls would avoid the requirement for the shell protocols, and the firmware provides a mechanism to obtain the path of the currently running executable which would avoid the need to explicitly locate the device. But I'm lazy and this was a "I'm spending the day on a plane" project initially inspired by a Sazerac-fuelled conversation during the UEFI plugfest, not a demonstration of UEFI best practices.

UEFI Boot To Zork and the source code to the modified Frotz can be downloaded here.

Eric (a fellow Fedora board member) has a post describing his vision for what Fedora as an end goal should look like. It's essentially an assertion that since we have no idea who our users are or what they want, we should offer them everything on an equal footing.

Shockingly enough, I disagree.

At the most basic level, the output of different Special Interest Groups is not all equal. We've had issues over the past few releases where various spins have shipped in a broken state, because the SIG responsible for producing them doesn't have the resources to actually test them. We're potentially going to end up shipping F20 with old Bluetooth code because the smaller desktops aren't able to port to the new API in time[1]. Promoting these equally implies that they're equal, and doing so when we know it isn't the case is a disservice to our users.

But it's not just about our users. Before I joined the Fedora project, I'd worked on both Debian and Ubuntu. Debian is broadly similar to the current state of Fedora - no strong idea about what is actually being produced, and a desire among many developers to cater to every user's requirements. Ubuntu's pretty much the direct opposite, with a strongly defined goal and a willingness to sacrifice some use cases in order to achieve that goal.

This leads to an interestingly different social dynamic. Ubuntu contributors know what they're working on. If a change furthers the well-defined aim of the project, that change happens. Moving from Ubuntu to Fedora was a shock to me - there were several rough edges in Fedora that simply couldn't be smoothed out because fixing them for one use case would compromise another use case, and nobody could decide which was more important[2]. It's basically unthinkable that such a situation could arise in Ubuntu, not just because there was a self appointed dictator but because there was an explicit goal and people could prioritise based on that[3].

Bluntly, if you have a well-defined goal, people are more likely to either work towards that goal or go and do something else. If you don't, people will just do whatever they want. The risk of defining that goal is that you'll lose some of your existing contributors, but the benefit is that the existing contributors will be more likely to work together rather than heading off in several different directions.

But perhaps more importantly, having a goal can attract people. Ubuntu's Bug #1 was a solid statement of intent. Being freer than Microsoft wasn't enough. Ubuntu had to be better than Microsoft products on every axis, and joining Ubuntu meant that you were going to be part of that. Now it's been closed and Ubuntu's wandered off into convergence land, and signing up to spend your free time on producing something to help someone sell phones is much less compelling than doing it to produce a product you can give to your friends.

Fedora should be the obvious replacement, but it's not because it's unclear to a casual observer what Fedora actually is. The website proudly leads with a description of Fedora as a fast, stable and powerful operating system, but it's obvious that many of the community don't think of Fedora that way - instead it's a playground to produce a range of niche derivatives, with little consideration as to whether contributing to Fedora in that way benefits the project as a whole. Codifying that would actively harm our ability to produce a compelling product, and in turn reduce our ability to attract new contributors even further.

Which is why I think the current proposal to produce three first-class products is exciting. Offering several different desktops on the download page is confusing. Offering distinct desktop, server and cloud products isn't. It makes it clear to our users what we care about, and in turn that makes it easier for users to be excited about contributing to Fedora. Let's not make the mistake of trying to be all things to all people.

[1] Although clearly in this case the absence of a stable ABI in BlueZ despite it having had a dbus interface for the best part of a decade is a pretty fundamental problem.
[2] See the multi-year argument over default firewall rules and the resulting lack of working SMB browsing or mDNS resolving
[3] To be fair, one of the reasons I was happy to jump ship was because of the increasingly autocratic way Ubuntu was being run. By the end of my involvement, significant technical decisions were being made in internal IRC channels - despite being on the project's Technical Board, I had no idea how or why some significant technical changes were being made. I don't think this is a fundamental outcome of having a well-defined goal, though. A goal defined by the community (or their elected representatives) should function just as well.
Update: While this isn't strictly fixed (Mir will still perform a VT switch without waiting to ensure that XMir has released the input devices), the time window for console input events to end up in XMir is now small enough that it's probably not a security problem any more.

It'd be easy to assume that in a Mir-based world, the Mir server receives input events and hands them over to Mir clients. In fact, as I described here, XMir uses standard Xorg input drivers and so receives all input events directly. This led to issues like the duplicate mouse pointer seen in earlier versions of XMir - as well as the pointer being drawn by XMir, Mir was drawing its own pointer.

But there's also some more subtle issues. Mir recently gained a fairly simple implementation of VT switching, simply listening for input events where a function key is hit while the ctrl and alt modifiers are set[1]. It then performs the appropriate ioctl on /dev/console and the kernel switches the VT. The problem here is that Mir doesn't tell XMir that this has happened, and so XMir still has all of its input devices open and still pays attention to any input events.

This is pretty easy to demonstrate. Open a terminal or text editor under Xmir and make sure it has focus. Hit ctrl+alt+f1 and log in. Hit ctlr+alt+f7 again. Your username and password will be sitting in the window.

This is Launchpad bug 1192843, filed on the 20th of June. A month and a half later, Mir was added to the main Ubuntu repositories. Towards the bottom, there's a note saying "XMir always listening to keyboard, passwords may appear in other X sessions". This is pretty misleading, since "other X sessions" implies that it's only going to happen if you run multiple X sessions. Regardless, it's a known bug that can potentially leak user passwords.

So it's kind of odd that that's the only mention of it, hidden in a disused toilet behind a "Doesn't work on VESA" sign. If you follow the link to installation instructions you get this page which doesn't mention the problem at all. Now, to be fair, it doesn't mention any of the other problems with Mir either, but the other problems merely result in things not working rather than your password ending up in IRC.

This being developmental software isn't an excuse. There's been plenty of Canonical-led publicity about Mir and people are inevitably going to test it out. The lack of clear and explicit warnings is utterly inexcusable, and these packages shouldn't have landed in the archive until the issue was fixed. This is brutally irresponsible behaviour on the part of Canonical.

So, if you ever switch to a text VT, either make sure you're not running XMir at the moment or make sure that you never leave any kind of network client focused when you switch away from X. And you might want to check IRC and IM logs to make sure you haven't made a mistake already.

[1] One lesser-known feature of X is that the VT switching events are actually configured in the keymap. ctrl+alt+f1 defaults to switching to VT1, but you can remap any key combination to any VT switch event. Except, of course, this is broken in XMir because Mir catches the keystroke and handles it anyway.
If you're tempted to add a platform-specific quirk to a Linux driver, pause and do the following:

  1. Check whether the platform works correctly with the generic Windows driver for the hardware in question. If it requires a platform-specific driver rather than the generic one, adding a quirk is probably ok.
  2. If the generic Windows driver works, check whether there's any evidence of platform-specific code in the Windows driver. This will typically be in the .inf file, but occasionally you'll want to run strings against the Windows driver and see whether any functions or strings match the platform in question. If there's evidence of special-casing in the generic Windows driver, adding a quirk is probably ok
  3. If the generic Windows driver works and doesn't appear to have any platform-specific special casing, don't add a quirk. You'll plausibly fix the machine you care about, but you won't fix any others that have the same behaviour. Even worse, if someone does eventually fix the problem properly, there's a risk that your special-casing will now break your system.

The moral to this story is: if you think adding a quirk is the right solution, you're almost certainly wrong.
Shipping a UEFI-bootable Linux distribution is a touch awkward[1], with the main sticking point being the necessity to produce boot media with multiple El-Torito images. An El-Torito image is either an image of a floppy disk or a small hard drive, embedded into the ISO. This allows BIOS systems to look for an El Torito image, hook some interrupts and then boot without the BIOS having to care about the fact that the image is embedded in an ISO-9660 filesystem. UEFI systems will look for an El-Torito image with a special tag - if they find it they'll mount it as a FAT filesystem and look for a bootloader, and if not they'll fall back to BIOS compatibility.

So, if you want a CD that'll boot on both BIOS and UEFI systems, you need two El-Torito images - one for BIOS, one for UEFI. Unfortunately, various BIOSes deal exceptionally badly with CD images that contain more than one El-Torito image. The most common failure case is to print a menu asking you to choose an option without labelling the options, but some will just fail outright. Thankfully, this is pretty much exclusively limited to 32-bit systems.

Things get irritatingly more complicated due to a quirk of UEFI. UEFI is based on executing code in native mode. That means that 32-bit UEFI systems can't execute 64-bit code in firmware, even if the CPU is capable of it. A 64-bit OS can only boot on 32-bit UEFI if it has very ugly compatibility hacks, including having to rewrite structures and register state every time it makes a UEFI call. The only OS I'm aware of that implements this is MacOS X. Having looked into what it'd take to implement it in Linux, I decided that hammering rusty nails through my feet would be a preferable use of time. Thankfully, I went drinking instead.

So distributions have a choice. They can either produce UEFI-bootable CD images for 32-bit x86 and risk failures on actual 32-bit systems, or they can ignore UEFI entirely on 32-bit and succeed in booting on all the hardware that people actually own[2]. Unsurprisingly, they tend to choose the second option.

So, if you're building an x86 hardware platform, don't ship with 32-bit UEFI. If you're stuck with a 32-bit CPU then just ship BIOS. If you have a 64-bit CPU then ship a 64-bit UEFI. If you ship with 32-bit UEFI, no significant existing Linux distribution will support you, and you'll face an uphill battle to convince them to do so.

32-bit UEFI. Just say what on earth were you thinking, please, no, can't you find a solution that doesn't involve me getting tetanus jabs.

(If you're worried about the extra memory consumption of 64-bit OSes, just encourage 32-bit userspace on a 64-bit kernel. Or just boot via BIOS)

[1] See the number that still don't manage it despite having had several years to adapt
[2] Until recently, the only vendor to ship 32-bit UEFI firmware on volume hardware was Apple. This was fine on their 32-bit systems, but on their 64-bit systems with 32-bit UEFI booting a 64-bit version of Windows would result in boot failure. Apple rectified this by stating that 64-bit Windows wasn't supported on platforms with 32-bit UEFI, which is a neat trick if you can manage it. 32-bit Windows (and Linux) was fine because it didn't include a UEFI boot image and so didn't trigger the bug.
Jon Masters, Chief ARM Architect at Red Hat, recently posted a description of his expectations for baseline arm64 servers. The quick summary is that systems should implement UEFI and ACPI, and any more traditional ARM boot mechanisms should be ignored. This is an interesting departure from the status quo in the ARM world, and it's worth thinking about the benefits and drawbacks of this approach.

It's very easy to build a generic kernel for most x86 systems, since the PC platform is fairly well defined even if not terribly well specified. Where system hardware does vary, it's almost always exposed on an enumerable bus (such as PCI or USB) which allows the OS to bind appropriate drivers. Things are different in the ARM world. Even once you're past the point of different SoC vendors requiring different kernel setup code and drivers, you still have to cope with the fact that system vendors can wire these SoCs up very differently. Hardware is often attached via GPIO lines without any means to enumerate them. The end result is that you've traditionally needed a different kernel for every ARM board. This is viable if you're selling the OS and hardware as a single product, but less viable if there's any desire to run a generic OS on the hardware.

The solution that's been adopted for this in the Linux world is called Device Tree. Device Tree actually has significant history, having been used as the device descriptor format in Open Firmware. Since there was already support for it in the Linux kernel, adapting it for use in ARM devices was straightforward. Device Tree aware devices can pass a descriptor blob to the kernel at startup[1], and devices without that knowledge can have a blob build into the kernel.

So, if this problem is already solved, why the push to move to UEFI and ACPI? This push didn't actually originate in the Linux world - Microsoft mandate that Windows RT devices implement UEFI and ACPI, and were they to launch a Windows ARM server product would probably carry that over. That makes sense for Microsoft, since recent versions of Windows have been x86 only and so have grown all kinds of support for ACPI and UEFI. Supporting Device Tree would require Microsoft to rewrite large parts of Windows, whereas mandating UEFI and ACPI allowed them to reuse most of their existing Windows boot and driver code. As a result, largely at Microsoft's behest, ACPI 5 has grown a range of additional features for describing things like GPIO pinouts and I2C connections. Whatever your weird device layout, you can probably express it via ACPI.

This argument works less well for Linux. Linux already supports Device Tree, whereas it currently doesn't support ACPI or UEFI on ARM[2]. Hardware vendors are already used to working with Device Tree. Moving to UEFI and ACPI has the potential to uncover a range of exciting new kernel issues and vendor bugs. It's not obviously an engineering win.

So how about users? There's an argument that since server vendors are now mostly shipping ACPI and UEFI systems, having ARM support these technologies makes it easier for customers to replace x86 systems with ARM systems. This really doesn't fly for ACPI, which is entirely invisible to the user. There are no standard ACPI entry points for system configuration, and the bits of ACPI that are generically useful (such as configuring system wakeup times) are already abstracted away to a standard interface by the kernel. It's somewhat more compelling for UEFI. UEFI supports a platform-independent bytecode language (EFI Byte Code, or EBC), which means that customers can write their own system management utilities, build them for EBC and then deploy them to their servers without caring about whether they're x86 or ARM. Want a bootloader that'll hit an internal HTTP server in order to determine which system image to deploy, and which works on both x86 and ARM? Straightforward.

Arnd Bergmann has a interesting counterargument. In a nutshell, ARM servers aren't currently aiming for the same market as x86 servers, and as a result customers are unlikely to gain any significant benefit from shared functionality between the two.

So if there's no real benefit to users, and if there's no benefit to kernel developers, what's the point? The main one that springs to mind is that there is a benefit to distributions. Moving to UEFI means that there's a standard mechanism for distributions to interact with the firmware and configure the bootloader. The traditional ARM approach has been for vendors to ship their own version of u-boot. If that's in flash then it's not much of a problem[3], but if it's on disk then you have to ship a range of different bootloaders and know which one to install (and let's not even talk about initial bootstrapping).

This seems like the most compelling argument. UEFI provides a genuine benefit for distributions, and long term it probably provides some benefit to customers. The question is whether that benefit is worth the flux. The same distribution benefit could be gained by simply mandating a minimum set of u-boot functionality, which would seem much more straightforward. The customer benefit is currently unclear.

In the end it'll probably be a market decision. If Red Hat produce an ARM product that has these requirements, and if Suse produce an ARM product that will work with u-boot and Device Tree, it'll be up to vendors to decide whether the additional work to support UEFI/ACPI is worth it in order to be able to sell to customers who want Red Hat. I expect that large vendors like HP and Dell will probably do it, but the smaller ones may not. The customer demand issue is also going to be unclear until we learn whether using UEFI is something that customers actually care about, rather than a theoretical benefit.

Overall, I'm on the fence as to whether a UEFI requirement is going to stick, and I suspect that the ACPI requirement is tilting at windmills. There's nothing stopping vendors from providing a Device Tree blob from UEFI, and I can't think of any benefits they gain from using ACPI instead. Vendor interest in the generic parts of the ACPI spec has been tepid even in the x86 world (the vast majority of ACPI spec updates come from Microsoft and Intel, not any of the system vendors), and I don't see that changing with the introduction of a range of ARM vendors who are already happy with Device Tree.

We'll see. Linux is going to need to gain the support for UEFI and ACPI on ARM in any case, since there's already hardware shipping in that configuration. But with ARM vendors still getting to grips with Device Tree, forcing them to go through another change in how they do things is going to be hard work. Red Hat may be successful in enforcing these requirements at the cost of some vendor unhappiness, or Red Hat may find that their product doesn't boot on most of the available hardware. It's an aggressive gamble, and while it'll be interesting to see how it plays out, I'm not that optimistic.

[1] The blob could be pulled from the firmware, but it's not uncommon for it to be built into u-boot instead. This does mean that you have a device-specific u-boot even if you have a generic kernel, but that's typically true anyway.
[2] Patches have been posted for ARM UEFI support. They're not mergeable in their current form, but they should be in the near future. ACPI support is in development.
[3] Although not all u-boots are created equal - some vendors ship versions that will only boot off FAT, some vendors ship versions that will only boot off ext2. Having to special case this stuff in your installer is a pain.
Mir is Canonical's new display server. It fulfils a broadly similar role to Wayland and Android's Surfaceflinger, in that it takes final responsibility for getting pixels onto the screen. XMir is an X server that runs on top of Mir. It permits applications that know how to speak the X protocol but don't know how to speak to Mir (ie, approximately all of them at present) to run in a Mir-based environment.

For Ubuntu 13.10, Canonical are proposing to use Mir by default. This doesn't mean that most applications will be using Mir, though - instead, the default session will run XMir as a full-screen client and a normal X environment will be run on that. This lets Canonical deploy Mir without forcing anyone to update their applications, allowing them to take a gradual approach. By 14.10, Canonical expect the default Unity session to be a Mir client rather than an X client. In theory it will then be possible to run an Ubuntu system without any X applications at all, leaving XMir to do nothing other than run legacy applications.

Of course, this requires a certain amount of replumbing. X would normally be responsible for doing things like setting up the screen and pushing the pixels out to the hardware, but this is now handled by Mir instead. Where a native X server would allocate a framebuffer in video memory and render into it, XMir asks Mir for a window corresponding to the size of the screen and renders into that and then simply asks Mir to display it. This step is actually more interesting than it sounds.

Unless you're willing to throw lots of CPU at them, unaccelerated graphics are slow. Even if you are, you're going to end up consuming more power for the same performance, so XMir would be impractical if it didn't provide access to accelerated hardware graphics functions. It makes use of the existing Xorg accelerated X drivers to do this, which is as simple as telling the drivers to render into the window that XMir requested from Mir rather than into the video framebuffer directly. In other words, when displaying through XMir, you're using exactly the same display driver stack as you would be if you were using Xorg. In theory you'd expect identical performance - in practice there's a 10-20% performance hit right now, but that's being actively worked on. Fullscreen 3D apps will also currently take a hit due to there being no support for skipping compositing, which is being fixed. XMir should certainly be capable of performing around as well as native X, but there's no reason for it to be any faster.

The output drivers aren't the only part of the stack. XMir still needs some way to get input events. You might naively expect that Mir would forward input events to XMir, but the current state of affairs is that XMir loads the existing X input drivers which in turn open the input devices themselves. This explains why XMir currently shows two cursors[1] - the first cursor is the Mir cursor that's being driven by the input events Mir is receiving, while the second is the cursor being drawn by XMir in response to the input events that it's receiving[2].

The final main piece of duplicated functionality is output management. In the case where X is its own display server, it can simply ask the kernel to program the outputs. XMir can't do that, so would have to ask Mir to do it instead. Sadly, that functionality's not implemented yet. Running xrandr on Xmir gives:
Screen 0: minimum 320 x 320, current 1366 x 768, maximum 8192 x 8192
XMIR-1 connected primary 1366x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   XMIR mode of death[3]   60.0*+
which translates as "I have a screen that can be any size from 320x320 to 8192x8192, and have one output of unknown physical size that can only display 1366x768 at 60Hz". Other than the 1366x768, the result will be identical no matter how many outputs you have - XMir currently has no support for configuring multiple displays, and will always report 60Hz. It also has no support for placing monitors into any kind of DPMS state.

Obviously, this is still software under active development. There's three months from now until until Ubuntu is due to release with this functionality enabled by default, and the only significant missing features are xrandr and DPMS support. As long as Mir exposes interfaces for controlling those, they shouldn't be a problem to implement. To most users, the transition from native Xorg to XMir should be completely invisible.

Which is kind of the problem. What benefits does the user gain from using XMir on Mir rather than native Xorg?

Features? XMir is a total of around 1000 lines of code on top of Xorg. It implements no functionality that isn't present in Xorg.

Performance? XMir effectively is Xorg - it's the same code running the same drivers. It's not going to be any faster. At the moment it's actually slower, but that's probably fixable.

Security? Again, XMir is effectively Xorg. It still runs as root. It has the same privileges. There's no security benefit to using XMir instead of Xorg.

Like I said, the transition should be completely invisible - no drawbacks, but no benefits. What it does do is get Mir deployed on a larger number of systems, which means Canonical can both get wider testing of the basic Mir functionality and argue that the number of deployed Mir systems means hardware vendors should support it. Users won't see any of the benefits until Unity transitions to being a native Mir client, which is slated for 14.10 next year.

As a PR move, it's pretty great. The public perception is that Mir has gone from not existing to being sufficient to run a full desktop environment in very little time, and it's natural to compare it to Wayland which has been in development for longer and still isn't running a fully featured desktop session. The problem with that perception is that Mir is doing very little at the moment. It's setting an output mode (by asking the kernel to), allocating a window for XMir to draw into and then pushing that window onto the screen whenever XMir tells it that it's changed. This isn't advanced display server functionality, it's the bare minimum that a display server could possibly do[4]. Which is a shame, because Mir is significantly more functional than that, it's just not being used to do anything we couldn't already do.

In summary: XMir on Mir in Ubuntu provides no user benefits and isn't a compelling technology demo. Mir itself will permit a range of additional features, but isn't slated to be running a user session itself until 14.10. The only obvious benefit to Canonical in shipping XMir on Mir is to gain additional testing, which makes using it in 14.04, a supposedly stable and long term release, a somewhat surprising choice.

[1] Although this is in the process of being fixed
[2] This also explains why, on systems with a trackpoint and a touchpad, only the trackpoint moves the Mir cursor while both move the XMir cursor. The X touchpad driver has put the touchpad in absolute mode, which means it's now reporting events that Mir currently ignores.
[3] "XMIR mode of death" is hardcoded into XMir. I'm really hoping it's a Regular Show reference.
[4] So why hasn't anyone done this with Wayland? Mostly because, as described above, there's no technical benefit in doing so. Wayland does have an X server for compatibility purposes, and if you wanted you could run it as a full screen client and run an entire session underneath it. But you'd gain nothing by doing so. The Wayland X server is intended for running individual clients rather than an entire X session. Run an X client under Wayland and it'll pop up in its own individual window and managed by your Wayland session's window manager, just like it would under X. XMir currently has no support for this "rootless" mode - right now if you want to run X apps under Mir, you'll need to launch an entire X session with its own window manager.
Recent Intel-based systems often implement something called Intel Rapid Start Technology. Like many things with the word "Technology" in the name, there's a large part of this that's marketing. The relatively small amount of technical documentation available implies that it's tied to your motherboard chipset and CPU, but as far as I can tell it's entirely implemented in firmware and could work just as well on, say, a Cyrix on a circa 1996 SIS-based motherboard if someone wrote the BIOS code[1]. But since nobody has, we're stuck with the vendors who've met Intel's requirements and licensed the code.

The concept of IRST is pretty simple. There's a firmware mechanism for setting a sleep timeout. If you suspend your computer and this timeout expires, it'll resume. However, instead of handing control back to the OS, the firmware just copies the entire contents of RAM to a special partition and turns the computer off. Next time you hit the power button, the firmware dumps the partition contents back into RAM and resumes as if nothing had changed. This takes a few seconds longer than resume from S3 but is far faster than resume from hibernation since it starts the moment the system gets power.

At a more technical level, it's a little more complicated. The first thing to know about this feature is that it's entirely invisible unless your hard drive is set up correctly. There needs to be a partition that's at least the size of your system's physical RAM. For GPT systems, this needs to have a type GUID of D3BFE2DE-3DAF-11DF-BA-40-E3A556D89593. For MBR systems, you need a partition type of 0x84[2]. If the firmware doesn't find an appropriate partition then the OS will get no indication that the firmware supports it. Boo.

(The second thing is that it seems like it really does have to be on an SSD, and if you try to do this on spinning media your firmware will ignore it anyway)

If all the prerequisites are in place, an ACPI device with an HID of INT3392 will exist. It has four methods associated with it: GFFS, SFFS, GFTV and SFTV. GFFS returns an integer representing the events that will cause the system to wake up from S3 and suspend to SSD. The system will wake after the timeout expires if bit 0 is set, and will wake when the battery becomes critically low if bit 1 is set. The other bits appear to be unused at the moment[3]. SFFS sets the wakeup events, using the same bit values as GFFS. GFTV returns an integer containing the current wakeup timeout in minutes. SFTV sets it. Values above 1440 (ie, 24 hours) seem to be considered invalid - if I set them the value instead ends up as 10 and the timeout flag gets cleared from the wakeup events field.

I've submitted a patch that adds a sysfs interface for setting these values, and unless anyone objects it'll probably end up in 3.11. There's still the remaining question of how userspace should make use of these, and also how installers should behave when it comes to systems that support IRST. As previously mentioned, there's no obvious indication to the OS that the feature is supported unless the appropriate partition already exists. The easiest way to deal with this is for installers to default to retaining any partitions with the magic IDs, but I'm still looking into whether it's possible to get the firmware to cough up some more information so it can be created automatically even if the drive's entirely blank.

And now, having got this working on a test machine, I just need to split my Thinkpad's swap partition in half and make sure it works here as well. Woo.

[1] Note: I am not going to do this.
[2] Conveniently, the same as the partition type that APM systems used for suspend to disk back when dubstep hadn't been invented yet
[3] At least, if you attempt to set them they get ignored.
One of the goals of our work at Nebula is making it as easy as possible for someone to set up a private cloud. In an ideal world that would involve being able to just plug in the controller hardware, wire up a rack of servers and turn everything on, but right now there are cases where the default firmware configuration on the servers doesn't match our desired configuration. Plugging a console into individual servers just to set some BIOS options is (based on personal experience) about as much fun as writing a doctoral thesis on the experience of watching paint dry, so it seemed worth trying to find a way to avoid people having to deal with that. Thankfully, it turns out that the industry has come to a similar set of conclusions. Recent Dell hardware lets you use WS-MAN, which makes it easy to do things like enable security features as long as you have an authenticated connection to the iDRAC management system. This actually works out wonderfully - the first time a Dell node sends a PXE request to the Cloud Controller, it can push back some configuration changes, reboot the system and then boot it in a known good configuration. Thanks, Dell![1].

HP's slightly trickier. As far as I've been able to work out, the remotely-available programmatic interfaces only provide configuration interfaces to the iLO device itself, along with a range of chassis-level monitoring. Instead, HP provide a couple of utilities that allow the OS to change values. The first of these is called conrep. Conrep is able to modify BIOS settings, but needs an XML file which tells it which configuration values are at which addresses. That's a bit of a pain. Thankfully it's being replaced with a new tool called hprcu which is able to directly query the firmware in order to figure out which configuration options are available and which values have to be stored where in order to set them. You run hprcu once and it spits out a file containing your existing settings. You modify that, feed it back into hprcu, reboot and you're set.

Sounds pretty ideal. What's the problem? The first is that hprcu is currently only shipped as a 32-bit binary, and right now we don't deploy any 32-bit support code on our nodes. It'd be irritating to have to do so just to configure the firmware. The second is that it works by doing raw port IO, and that's not going to be possible from userspace once we've moved to a UEFI Secure Boot setup.

So, obviously, I've been working on reimplementing it. The first step was to figure out what it was actually doing. The first step was to strace it to see if it was using the kernel IPMI interface. strace showed no accesses to /dev/ipmi*. My next thought was that it could be using some sort of shared memory segment with the iLO hardware, so I used MMIOTrace to dump the memory accesses it made. Turned out there weren't many, and certainly not enough to do anything interesting. Then I went back to the strace logs and saw that it was calling iopl() a bunch, and then I got sad.

iopl() allows a userspace application to change its io privilege levels. The Linux manpage for the iopl() call is amazingly helpful, informing us with a straight face that "This call is necessary to allow 8514-compatible X servers to run under Linux." What it actually does is grant userspace access to the full range of io ports.

So, what's an io port? The x86 architecture (and some others) provide two ways to communicate with hardware. The one that's mostly used these days is to map the hardware into the same address space as memory (memory-mapped io), but x86 has an entirely separate address range that can be used. This is significantly more limited, as only 65,536 addresses are available for the entire system. Further, you can only read or write a maximum of 4 bytes in a single transaction. However, while slower and less generally useful than memory-mapped io, port io has the advantage that it's much simpler to implement. As a result, a lot of the original PC hardware was intended to be accessed via port io, and a bunch of that's still present. Want to reprogram your real-time clock? Port io. Want to read from the legacy keyboard controller? Port io. For low-bandwidth transactions, it's a completely reasonable way to implement hardware.

Applications[2] perform port io by executing in and out instructions. Since these instructions allow you to do things like, say, hit the keyboard controller directly[3], userspace is generally forbidden from calling them. Sufficiently privileged applications can call iopl() to raise their privileges and gain access to the full set of io ports. But, since in and out are CPU-level instructions, the actual port io accesses won't show up in strace. So I had to take another approach.

The correct way of doing this would be to LD_PRELOAD something that intercepted iopl(), didn't call it, let the application perform an in or out instruction, caught the fault, looked at the stack frame, called iopl(), performed the access, dumped debug details, dropped iopl() again and restored state. Because that seemed like a bunch of work, I took advantage of the fact that hprcu had separate in and out functions and used gdb. I told gdb to break on every call to in or out, dump the register state and then continue. Then I hacked up a script to parse the register dumps and tell me where the reads and writes were going. Astonishingly, it worked. And then I read the results and got really sad.

PCI setup is hard, if you're a BIOS. You've got PCI devices with large memory windows and you've got to arrange them all somehow and look you've only got 640K of RAM you can access while you're doing this and come on, seriously? So PC-type systems define a port IO interface to performing PCI configuration. Each PCI device has an address made up of a bus, a device and a function. You take a 32-bit integer, set the top bit to indicate that you want PCI access, put the bus in bits 16-23, the device in bits 11-15, the function in bits 8-10 and the configuration register on that device you want to access in bits 0-7. Then you write that to io port 0xcf8. Reads or writes to io port 0xcfc then read or write from that configuration register on that device. This lets the BIOS tell each PCI device where it's going to live in mmio address space without having to actually perform any mmio. Good work, BIOS developers.

This works fine in the BIOS, because nothing else is running while the BIOS is. It even works fine in the kernel, which uses this approach on older hardware that doesn't provide a memory-mapped mechanism to access PCI configuration registers. But it works really badly if you're doing it in userspace, because it shares a problem with all other indexed accesses. You write an address to 0xcf8. You write a value to 0xcfc. What happens if the kernel writes to 0xcf8 before you write to 0xcfc? Your access goes to some other register on some other device and suddenly you've just mapped a PCI device over the top of RAM and well I sure hope you saved all your work.

So, I wasn't thrilled to discover that hprcu was communicating with the iLO by using 0xcf8/0xcfc port io accesses. Not only was it going to stop working once we started deploying UEFI Secure Boot, it had the potential to cause really annoyingly hard to debug problems. Working out how to reimplement it became something of a priority.

By looking at the XML files it generated, and by following the port IO accesses it was performing, I could figure out pretty much how it worked. Some configuration values are stored in the real-time clock CMOS. These were just done in the standard way - write the address to port 0x70, read or write the value to port 0x71[4]. Other configuration values were stored in NVRAM, with accesses going via PCI. This was a little trickier to figure out, because sometimes different NVRAM addresses went to the same PCI address. I finally figured it out - there's a 48 byte window into NVRAM via the PCI configuration registers, and another register which chooses which set of NVRAM is visible. So, take the NVRAM address, divide by 48, write that value to register 0xa6, take the modulus, add it to 0x80 and write the desired value to that address.

So now I knew enough to be able to take an existing XML file and deploy it. Definitely progress, and I could even add support to the kernel. But I still wanted to know how hprcu generated that XML file in the first place. Running strings over the binary showed a bunch of debug output that it never actually printed, but immediately after the help text it also printed SsLlFf:HhAaTtOoDd. I've lost enough years of my life to this kind of thing to be able to identity that as a getopt format string, and there was that tantalising D option at the end that went entirely unmentioned in the help text. Re-running hprcu with the -D argument gave me huge piles of debug output, including references to a DMI entry and an $RBS table.

DMI is a standard for exposing system information, and one aspect of it is providing data from the firmware to the operating system. The OS can look for a defined signature in a fixed area of memory and then pull out a bunch of data about the hardware, including things like the vendor name, model, serial number, BIOS version and more. Some of these information tables are standardised and some are vendor-specific. hprcu was looking for a vendor-specific table and then scanning for it looking for a known signature. That table turned out to to contain a set of signatures and addresses. The address corresponding to the signature hprcu was contained a bunch of data starting with "$RBS". It also contained a huge quantity of ASCII that looked awfully like the strings in the XML file that hprcu had written. Success!

So, the rest of today was spent on working out the format of this table. This was only marginally less tedious than setting up BIOS settings on 20 servers by hand, but I've made enough progress to be able to figure out how to write a kernel-level driver for this. That's about half-done now - the parsing code is all implemented, I just need to add the sysfs glue and I am nowhere near drunk enough for that right now[5], so for now you're just getting a format description.

To start with, look for a DMI table with a type of 0xe5. This contains several entries of 4 bytes of signature, 8 bytes of address and 4 bytes of length. Find the entry with a signature of "$CRQ" and map the corresponding address. The first four bytes of the mapped region should be "$RBS". The first record is 21 bytes into the file. Each record starts with a single byte representing the type and two bytes representing the record length. Records of type 1 contain an ascii string that starts 8 bytes into the record, and have a further type number 6 bytes into the record that tells you what type of string it is. 0x05 indicates that it's a "Feature string", which defines the overall classification of the following features. 0x06 is an "Option string", which provides a human readable explanation of what this configuration value corresponds to. 0x60 is an "Optional warning string", which tells the user which further configuration may be required before the configuration change takes effect.

All other types appear to be configuration options. Bytes 3 to 6 provide the little-endian name of the configuration option. Bytes 7 and 8 provide a unique numberical identifier for the configuration option. Byte 13 indicates the type. 0x03 is stored in CMOS. 0x04 is stored in NVRAM. 0x05 appears to be some different kind of CMOS store. I haven't figured out the others. There then follows a set of choices. These vary in length depending on the type - 0x03 are 14 bytes long, 0x04 are 6 bytes long, 0x05 are 5 bytes long. Type 0x03 choices have the choice id at byte 0, the CMOS offset at byte 2, the mask at byte 3 and the value at byte 4. Type 0x04 choices have the choice id at byte 0, the nvram offset at bytes 1 and 2, the mask at byte 3 and the value at byte 4. Type 0x05 choices have the choice id at byte 0, the cmos offset at byte 1, the mask at byte 2 and the value at byte 3. There are optional flags following each choice - if the final byte of the choice isn't 0, there then follow 6 bytes of flags. The first four bytes provide the name of a configuration option (little-endian, again), the fifth byte refers to a choice on that option and the sixth byte indicates what kind of flag. 0x1 appears to indicate that the option is mutually exclusive with that other option. Examples include the embedded serial port and virtual serial port options, where it's impossible to map both to the same address at once. A choice can have multiple flags.

After a configuration option, there'll be a set of option string records. In turn, these correspond to the previous choices - if a configuration option had three choices, there'll be three option string records. I haven't figured out whether there's a stronger way of binding the option string records to the configuration choice yet. The other thing I haven't entirely figured out are the details of configuring the platform to perform a cold reboot on the next power cycle, but the io traces for that don't look too bad.

The aim is to provide a kernel driver that exposes all these configuration options via sysfs, including indicating the current value, the available values and letting the user set a new value. In an ideal world we'd have a wonderfully generic interface to this kind of functionality, but I'm (sadly) not sure that that's possible.

And that's how I spent the past two days.

[1] (Thell)
[2] Or, indeed, kernels
[3] And, hence, read your keystrokes
[4] Which has exactly the same problems as the PCI access, in that if you happen to ask the clock what the time is while hprcu is accessing CMOS, at least one of you is going to get very confused
[5] Obviously, I would never write kernel code while drunk. You're certainly not running any of it in your enterprise kernel.
Mir is Canonical's equivalent to Wayland - a display server, responsible for getting application pixmaps onto a screen. It's intended to scale from mobile devices to the desktop, and as such is expected to turn up in Ubuntu Phone before too long[1]. There's already plenty of discussion about whether the technical differences between Wayland and Mir are sufficient to justify Canonical going their own way, so I'm not planning on talking about that.

Like many Canonical-led projects, Mir is under GPLv3 - a strong copyleft license. There's a couple of aspects of GPLv3 that are intended to protect users from being unable to make use of the rights that the license grants them. The first is that if GPLv3 code is shipped as part of a user product, it must be possible for the user to replace that GPLv3 code. That's a problem if your device is intended to be locked down enough that it can only run vendor code. The second is that it grants an explicit patent license to downstream recipients, permitting them to make use of those patents in derivative works.

One of the consequences of these obligations is that companies whose business models depend on either selling locked-down devices or licensing patents tend to be fairly reluctant to ship GPLv3 software. In effect, this is GPLv3 acting entirely as intended - unless you're willing to guarantee that a user can exercise the freedoms defined by the free software definition, you don't get to ship GPLv3 material. Some companies have decided that shipping GPLv3 code would be more expensive than either improving existing code under a more liberal license or writing new code from scratch. Android's a pretty great example of this - it contains no GPLv3 code, and even GPLv2 code (outside the kernel) is kept to a minimum.

Which, given Canonical's focus on pushing Ubuntu into GPLv3-hostile markets, makes the choice of GPLv3 an odd one. This isn't a problem as long as they're the sole copyright holder, because the copyright holder is obviously free to ship their code under as many licenses as they want. But Canonical still aim to foster community involvement, and ideally that includes accepting external contributions to their code. If Canonical simply accepted those contributions under GPLv3 then they'd no longer have the right to relicense the entire codebase, so any contributions are only accepted if the contributor has signed a Contributor License Agreement.

Canonical's CLA is pretty simple. In essence, it grants Canonical the right to use, modify and distribute your code, and it grants Canonical a patent license under any patents you own that may cover the code in question. But, most importantly, it grants Canonical the right to relicense your contribution under their choice of license. This means that, despite not being the sole copyright holder, Canonical are free to relicense your code under a proprietary license.

Given Canonical's market goals, this makes sense. They can relicense Mir (and any other GPLv3 projects they own) under licenses that keep their hardware partners happy, and they can ship in the phone market. Everyone's a winner.

Except, if Canonical want to ship proprietary versions, why not just license Mir under a license that permits that in the first place? This is where the asymmetry comes in. The Android userland is released under a permissive license that allows anyone to take Google's code, modify it as they wish and ship it on whatever hardware they want. I could legally start a company that provided customised versions of Android to phone vendors without them having any GPLv3 concerns. I won't be able to do that with Ubuntu Phone.

I'm a fan of GPLv3. I think the provisions it contains to support user freedom are important. I hate the growing trend of using free software to build devices that are, effectively, impossible for the end user to modify. If Canonical were releasing software under GPLv3 because of a commitment to free software then that would be an amazing thing. But it's pretty much impossible to square the CLA's requirement that contributors grant Canonical the right to ship under a proprietary license with a commitment to free software. Instead you end up with a situation that looks awfully like Canonical wanting to squash competition by making it impossible for anyone else to sell modified versions of Canonical's software in the same market.

Canonical aren't doing anything illegal or immoral here. They're free to run their projects in any way they choose. But retaining the right to produce proprietary versions of external contributions without granting equivalent reciprocal rights isn't consistent with caring about free software or contributing to the wider Linux community, especially if it means you get to exclude those external contributors from the market you're selling their code into.

(Edit to add: a friend in the contracting industry points out that it also prevents vendors who won't ship GPLv3 from using external contractors to work on Mir - they have to go to Canonical, because only Canonical can relicense contributions under a proprietary license.)

[1] Right now Ubuntu Phone is using Surfaceflinger, the Android display server, but that's apparently just an interim solution.
Since I wrote this, we've made some worthwhile progress on avoiding damaging Samsung hardware. The first is that the samsung-laptop driver appeared to be causing the firmware to attempt to write to an area of memory that was marked in the chipset, triggering a Machine Check Exception. That was what generated the pstore output that caused the problem originally. The driver now refuses to load if EFI is enabled, which avoids the problem. It's not ideal, since it's currently the only mechanism we have for certain functionality on Samsung laptops, but there you go.

The second problem was that avoiding crashing on boot didn't actually fix the problem in any fundamental way. Even with pstore disabled, it was possible for userspace to fill the nvram and trigger the same problem. Our first approach to this was to prevent any writes to nvram if the UEFI QueryVariableInfo() call reported that more than 50% of the nvram storage space would be used. That was safe, but led to another issue. The nvram storage area is typically implemented as part of the same flash chip as the firmware. Flash isn't arbitrarily accessible - changing the contents of a block typically involves rewriting the entire block. It's impractical to rewrite the entire nvram area on every write, so what actually happens is that deleting variables just results in them being marked as inactive but doesn't actually free up the space. The firmware can later perform some sort of garbage collection to free it up.

This caused us problems, since inactive space that hasn't been garbage collected yet isn't actually available, and as a result firmware implementations tend to count it as used. Say you had 64KB of nvram and wrote 32KB of variables. We'd then refuse to write any more because you'd drop below 50%. So you delete 16KB of the variables you've created and try again. Unfortunately, the firmware still thinks that there's 32KB in use and Linux would still refuse.

If you were lucky, rebooting would trigger a garbage collection run. If you weren't, it wouldn't. Problematic. Our next approach was to try to account for the space actually actively used by the variables, rather than relying on what the firmware told us via QueryVariableInfo(). This seems simple enough - just add up the size of all the variables and subtract that from the overall size to determine how much of the "used" space is actually just old inactive variables that can be ignored. However, there's still some problems there. The first is that each variable has some additional overhead associated with it, and the size of that overhead varies depending on the system vendor. We had to make a conservative guess, which could cause problems if systems had large numbers of small variables. The second is that the only variables the kernel can see are those that are flagged as runtime-visible. There may also be a significant quantity of nvram used to store variables that are only visible in boot services code. We could work around this by adding up sizes while we're still in boot services code, but on some systems calling QueryVariableInfo() before ExitBootServices() results in later calls to GetNextVariable() jumping to invalid addresses and crashing the kernel. Not a great approach.

Meanwhile, Samsung got back to us and let us know that their systems didn't require more than 5KB of nvram space to be available, which meant we could get rid of the 50% value and replace it with 5KB. The hope was that any system that booted with only 5KB of space available in nvram would trigger a garbage collection run. Unfortunately, it turned out that that wasn't true - some systems will only trigger garbage collection if the OS actually makes an attempt to write a variable that won't otherwise fit.

Hence this patch. The new approach is to ask the firmware how much space is available. If the size of the new variable would reduce this to less than 5K, we attempt to create a variable bigger than the remaining space. This should cause the firmware to realise that it's out of room and either (depending on implementation) perform a garbage collection run at runtime or set a flag that will cause the system to perform garbage collection on the next reboot. We then call QueryVariableInfo() again to see whether a garbage collection run actually happened, and if so check whether we now have enough space. If so, we go ahead and write the variable. If not, we tell userspace that there's not enough space.

This seems to work in all the situations I've tested, and it should avoid ending up in a situation where a Samsung can end up bricked. However, it's firmware, so who knows whether it's going to break things for someone else.
There's now no shortage of Linux distributions that support Secure Boot out of the box, so that's a mostly solved problem. But even if your distribution supports it entirely you still need to boot your install media in the first place.

Hardware initialisation is a slightly odd thing. There's no specification that describes the state ancillary hardware has to be in after firmware→OS handover, so the OS effectively has to reinitialise it again. This means that certain bits of hardware end up being initialised twice, and that's slow in some cases. The most obvious is probably USB, which has various timeouts as you wait for hardware to settle. Full USB support in the firmware probably adds a couple of seconds to boot time, and it's arguably wasted because the OS then has to do the same thing (but, thankfully, can at least do other things at the same time). So, looking for USB boot media takes time, and since the overwhelmingly common case is that users don't want to boot off USB, it's time that's almost always wasted.

One of the requirements for Windows 8 certified hardware is that it must complete firmware initialisation within a specific amount of time, something that Microsoft refer to as "Fast Boot". Meeting these requirements effectively makes it impossible to initialise USB, and it's likely that certain other things will also be skipped. If you've got a USB keyboard then this obviously means that your keyboard won't work until the OS starts, but even i8042 setup takes time and so some laptops with traditional PS/2-style keyboards may not set it up. That means the system will ignore the keyboard no matter how much you hammer it at boot, and the firmware will boot whichever OS it finds.

For a newly purchased device, that's going to be Windows 8. It's not too much of a problem with a fully installed Windows 8, since you can hold down shift while clicking the reboot icon and get a menu that lets you reboot into the firmware menu. Windows sets a flag in a UEFI variable and reboots the system, the firmware sees that flag and does full hardware initialisation and then drops you into the setup environment. It takes slightly longer to get into the firmware, but that's countered by the time you save every time you don't want to get into the firmware on boot.

So what's the problem? Well, the Windows 8 setup environment doesn't offer that reboot icon. Turn on a brand new Windows 8 system and you have two choices - agree to the Windows 8 license, or power the machine off. The only way to get into the firmware menu is to either agree to the Windows 8 license or to disassemble the machine enough that you can unplug the hard drive[1] and force the system to fall back to offering the boot menu.

I understand the commercial considerations that result in it ranging from being difficult to impossible to buy new hardware without Windows pre-installed, but up until now it was still straightforward to install an alternative OS without agreeing to the Windows license. Now, installing alternative operating systems on many new systems will require you to give up certain rights even if you want nothing other than to reach the system firmware menu.

I'm firmly of the opinion that there are benefits to Secure Boot. I'm also in favour of setups like Fast Boot. But I don't believe that anyone should be forced to agree to a EULA purely in order to be able to boot their own choice of OS on a system that they've already purchased.

[1] Which is a significant and probably warranty-voiding exercise on many systems, and that's assuming that it's not an SSD soldered to the motherboard…
I've been working on TPMs lately. It turns out that they're moderately awful, but what's significantly more awful is basically all the existing documentation. So here's some of what I've learned, presented in the hope that it saves someone else some amount of misery.

What is a TPM?

TPMs are devices that adhere to the Trusted Computing Group's Trusted Platform Module specification. They're typically microcontrollers[1] with a small amount of flash, and attached via either i2c (on embedded devices) or LPC[2] (on PCs). While designed for performing cryptographic tasks, TPMs are not cryptographic accelerators - in almost all situations, carrying out any TPM operations on the CPU instead would be massively faster[3]. So why use a TPM at all?

Keeping secrets with a TPM

TPMs can encrypt and decrypt things. They're not terribly fast at doing so, but they have one significant benefit over doing it on the CPU - they can do it with keys that are tied to the TPM. All TPMs have something called a Storage Root Key (or SRK) that's generated when the TPM is initially configured. You can ask the TPM to generate a new keypair, and it'll do so, encrypt them with the SRK (or another key descended from the SRK) and hand it back to you. Other than the SRK (and another key called the Endorsement Key, which we'll get back to later), these keys aren't actually kept on the TPM - the running OS stores them on disk. If the OS wants to encrypt or decrypt something, it loads the key into the TPM and asks it to perform the desired operation. The TPM decrypts the key and then goes to work on the data. For small quantities of data, the secret can even be stored in the TPM's nvram rather than on disk.

All of this means that the keys are tied to a system, which is great for security. An attacker can't obtain the decrypted keys, even if they have a keylogger and full access to your filesystem. If I encrypt my laptop's drive and then encrypt the decryption key with the TPM, stealing my drive won't help even if you have my passphrase - any other TPM simply doesn't have the keys necessary to give you access.

That's fine for keys which are system specific, but what about keys that I might want to use on multiple systems, or keys that I want to carry on using when I need to replace my hardware? Keys can optionally be flagged as migratable, which makes it possible to export them from the TPM and import them to another TPM. This seems like it defeats most of the benefits, but there's a couple of features that improve security here. The first is that you need the TPM ownership password, which is something that's set during initial TPM setup and then not usually used afterwards. An attacker would need to obtain this somehow. The other is that you can set limits on migration when you initially import the key. In this scenario the TPM will only be willing to export the key by encrypting it with a pre-configured public key. If the private half is kept offline, an attacker is still unable to obtain a decrypted copy of the key.

So I just replace the OS with one that steals the secret, right?

Say my root filesystem is encrypted with a secret that's stored on the TPM. An attacker can replace my kernel with one that grabs that secret once the TPM's released it. How can I avoid that?

TPMs have a series of Platform Configuration Registers (PCRs) that are used to record system state. These all start off programmed to zero, but applications can extend them at runtime by writing a sha1 hash into them. The new hash is concatenated to the existing PCR value and another sha1 calculated, and then this value is stored in the PCR. The firmware hashes itself and various option ROMs and adds those values to some PCRs, and then grabs the bootloader and hashes that. The bootloader then hashes its configuration and the files it reads before executing them.

This chain of trust means that you can verify that no prior system component has been modified. If an attacker modifies the bootloader then the firmware will calculate a different hash value, and there's no way for the attacker to force that back to the original value. Changing the kernel or the initrd will result in the same problem. Other than replacing the very low level firmware code that controls the root of trust, there's no way an attacker can replace any fundamental system components without changing the hash values.

TPMs support using these hash values to decide whether or not to perform a decryption operation. If an attacker replaces the initrd, the PCRs won't match and the TPM will simply refuse to hand over the secret. You can actually see this in use on Windows devices using Bitlocker - if you do anything that would change the PCR state (like booting into recovery mode), the TPM won't hand over the key and Bitlocker has to prompt for a recovery key. Choosing which PCRs to care about is something of a balancing act. Firmware configuration is typically hashed into PCR 1, so changing any firmware configuration options will change it. If PCR 1 is listed as one of the values that must match in order to release the secret, changing any firmware options will prevent the secret from being released. That's probably overkill. On the other hand, PCR 0 will normally contain the firmware hash itself. Including this means that the user will need to recover after updating their firmware, but failing to include it means that an attacker can subvert the system by replacing the firmware.

What about using TPMs for DRM?

In theory you could populate TPMs with DRM keys for media playback, and seal them such that the hardware wouldn't hand them over. In practice this is probably too easily subverted or too user-hostile - changing default boot order in your firmware would result in validation failing, and permitting that would allow fairly straightforward subverted boot processes. You really need a finer grained policy management approach, and that's something that the TPM itself can't support.

This is where Remote Attestation comes in. Rather than keep any secrets on the local TPM, the TPM can assert to a remote site that the system is in a specific state. The remote site can then make a policy determination based on multiple factors and decide whether or not to hand over session decryption keys. The idea here is fairly straightforward. The remote site sends a nonce and a list of PCRs. The TPM generates a blob with the requested PCR values, sticks the nonce on, encrypts it and sends it back to the remote site. The remote site verifies that the reply was encrypted with an actual TPM key, makes sure that the nonce matches and then makes a policy determination based on the PCR state.

But hold on. How does the remote site know that the reply was encrypted with an actual TPM? When TPMs are built, they have something called an Endorsement Key (EK) flashed into them. The idea is that the only way to have a valid EK is to have a TPM, and that the TPM will never release this key to anything else. There's a couple of problems here. The first is that proving you have a valid EK to a remote site involves having a chain of trust between the EK and some globally trusted third party. Most TPMs don't have this - the only ones I know of that do are recent Infineon and STMicro parts. The second is that TPMs only have a single EK, and so any site performing remote attestation can cross-correlate you with any other site. That's a pretty significant privacy concern.

There's a theoretical solution to the privacy issue. TPMs never actually sign PCR quotes with the EK. Instead, TPMs can generate something called an Attestation Identity Key (AIK) and sign it with the EK. The OS can then provide this to a site called a PrivacyCA, which verifies that the AIK is signed by a real EK (and hence a real TPM). When a third party site requests remote attestation, the TPM signs the PCRs with the AIK and the third party site asks the PrivacyCA whether the AIK is real. You can have as many AIKs as you want, so you can provide each service with a different AIK.

As long as the PrivacyCA only keeps track of whether an AIK is valid and not which EK it was signed with, this avoids the privacy concerns - nobody would be able to tell that multiple AIKs came from the same TPM. On the other hand, it makes any PrivacyCA a pretty attractive target. Compromising one would not only allow you to fake up any remote attestation requests, it would let you violate user privacy expectations by seeing that (say) the TPM being used to attest to HolyScriptureVideos.com was also being used to attest to DegradingPornographyInvolvingAnimals.com.

Perhaps unsurprisingly (given the associated liability concerns), there's no public and trusted PrivacyCAs yet, and even if they were (a) many computers are still being sold without TPMs and (b) even those with TPMs often don't have the EK certificate that would be required to make remote attestation possible. So while remote attestation could theoretically be used to impose DRM in a way that would require you to be running a specific OS, practical concerns make it pretty difficult for anyone to deploy that at any point in the near future.

Is this just limited to early OS components?

Nope. The Linux kernel has support for measuring each binary run or each module loaded and extending PCRs accordingly. This makes it possible to ensure that the running binaries haven't been modified on disk. There's not a lot of distribution infrastructure for setting this up, but in theory a distribution could deploy an entirely signed userspace and allow the user to opt into only executing correctly signed binaries. Things get more interesting when you add interpreted scripts to the mix, so there's still plenty of work to do there.

So what can I actually use a TPM for?

Drive encryption is probably the best example (Bitlocker does it on Windows, and there's a LUKS-based implementation for Linux here) - while in theory you could do things like use your TPM as a factor in two-factor authentication or tie your GPG key to it, there's not a lot of existing infrastructure for handling all of that. For the majority of people, the most useful feature of the TPM is probably the random number generator. rngd has support for pulling numbers out of it and stashing them in /dev/random, and it's probably worth doing that unless you have an Ivy Bridge or other CPU with an RNG.

Things get more interesting in more niche cases. Corporations can bind VPN keys to corporate machines, making it possible to impose varying security policies. Intel use the TPM as part of their anti-theft technology on education-oriented devices like the Classmate. And in the cloud, projects like Trusted Computing Pools use remote attestation to verify that compute nodes are in a known good state before scheduling jobs on them.

Is there a threat to freedom?

At the moment, probably not. The lack of any workable general purpose remote attestation makes it difficult for anyone to impose TPM-based restrictions on users, and any local code is obviously under the user's control - got a program that wants to read the PCR state before letting you do something? LD_PRELOAD something that gives it the desired response, or hack it so it ignores failure. It's just far too easy to circumvent.

Summary?

TPMs are useful for some very domain-specific applications, drive encryption and random number generation. The current state of technology doesn't make them useful for practical limitations of end-user freedom.

[1] Ranging from 8-bit things that are better suited to driving washing machines, up to full ARM cores
[2] "Low Pin Count", basically ISA without the slots.
[3] Loading a key and decrypting a 5 byte payload takes 1.5 seconds on my laptop's TPM.
According to the update here, the signing keys are supposed to be replaced by the hardware vendor. If vendors do that, this ends up being uninteresting from a security perspective - you could generate a signed image, but nothing would trust it. It should be easy enough to verify, though. Just download a firmware image from someone using AMI firmware, pull apart the capsule file, decompress everything and check whether the leaked public key is present in the binaries.

The real risk here is that even if most vendors have replaced that key, some may not have done. There's certainly an argument that shipping test keys at all increases the probability that a vendor will accidentally end up using those rather than generating their own, and it's difficult to rule out the possibility that that's happened.
(See here for an update to this)

A hardware vendor apparently had a copy of an AMI private key on a public FTP site. This is concerning, but it's not immediately obvious how dangerous this is for a few reasons. The first is that this is apparently the firmware signing key, not any of the Secure Boot keys. That means it can't be used to sign a UEFI executable or bootloader, so can't be used to sidestep Secure Boot directly. The second is that it's AMI's key, not a board vendor - we don't (yet) know if this key is used to sign any actual shipping firmware images, or whether it's effectively a reference key. And, thirdly, the code apparently dates from early 2012 - even if it was an actual signing key, it may have been replaced before any firmware based on this code shipped.

But there's still the worst case scenario that this key is used to sign most (or all) AMI-based vendor firmware. Can this be used to subvert Secure Boot? Plausibly. The attack would involve producing a new, signed firmware image with Secure Boot either disabled or with an additional key installed, and then to reflash that firmware. Firmware images are very board-specific, so unless you're engaging in a very targeted attack you either need a large repository of firmware for every board you want to attack, or you need to perform in-place modification.

Taking a look at the firmware update tool used for AMI systems, the latter might be possible. It seems that the AMI firmware driver allows you to dump the existing ROM to a file. It'd then be a matter of pulling apart the firmware image, modifying the key database, putting it back together, signing it and flashing it. It looks like doing this does require that the user enter the firmware password if one's set, so the simplest mitigation strategy would be to do that.

So. If this key is used by most vendors shipping AMI-based firmware, and if it's a current (rather than test) key, then it may well be possible for it to be deployed in an automated malware attack that subverts the Secure Boot trust model on systems running AMI-based firmware. The obvious lesson here is that handing out your private keys to third parties that you don't trust is a pretty bad idea, as is including them in source repositories.

(Wow, was this really as long ago as 2004? How little things change)
I gave a presentation at Libreplanet this weekend on the topic of Secure Boot and Restricted Boot. There's a copy of the video here - it should be up on the conference site at some point. It turned out to be excellent timing, in that a group in Spain filed a complaint with the European Commission this morning arguing that Microsoft's imposition of Secure Boot on the x86 client PC market is anticompetitive. I suspect that this is unlikely to succeed (the Commission has already stated that the current implementation appears to conform to EU law), and I fear that it's going to make it harder to fight the real battle we face.

Secure Boot means different things to different people. I think the FSF's definition is a useful one - Secure Boot is any boot validation scheme in which ultimate control is in the hands of the owner of the device, while Restricted Boot is any boot validation scheme in which ultimate control is in the hands of a third party. What Microsoft require for x86 Windows 8 devices falls into the category of Secure Boot - assuming that OEMs conform to Microsoft's requirements, the user must be able to both disable Secure Boot entirely and also leave Secure Boot enabled, but with their own choice of trusted keys and binaries. If the FSF set up a signing service to sign operating systems that met all of their criteria for freeness, Microsoft's requirements would permit an end user to configure their system such that it refused to run non-free software. My system is configured to trust things shipped by Fedora or built locally by me, a decision that I can make because Microsoft require that OEMs support it. Any system that meets Microsoft's requirements is a system that respects the freedom of the computer owner to choose how restrictive their computer's boot policy is.

This isn't to say that it's ideal. The lack of any common UI or key format between hardware vendors makes it difficult for OS vendors to document the steps users must take to assert this freedom. The presence of Microsoft as the only widely trusted key authority leaves people justifiably concerned as to whether Microsoft will be equally aggressive in blacklisting its own products as it will be in blacklisting third party ones. Implementation flaws in a (very) small number of systems have resulted in correctly signed operating systems failing to boot, requiring users to update their firmware before being able to install anything but Windows.

But concentrating on these problems misses the wider point. The x86 market remains one where users are able to run whatever they want, but the x86 market is shrinking. Users are purchasing tablets and other ARM-based ultraportables. Some users are using phones as their primary computing device. In contrast to the x86 market, Microsoft's policies for the ARM market restrict user freedom. Windows Phone and Windows RT devices are required to boot only signed binaries, with no option for the end user to disable the signature validation or install their own keys. While the underlying technology is identical, this differing set of default policies means that Microsoft's ARM implementation is better described as Restricted Boot. The hardware vendors and Microsoft define which software will run on these systems. The owner gets no say.

And, unfortunately, Microsoft aren't alone. Apple, the single biggest vendor in this market, implement effectively identical restrictions. Some Android vendors provide unlockable bootloaders, but others (either through personal preference or at the behest of phone carriers) lock down their platforms. A naive user is likely to end up purchasing a device that will, in the absence of exploited security flaws, refuse to run if any system components are modified. Even in cases where the underlying components are built using free software, there's no guarantee that the user will have the ability to assert any of those freedoms.

Why does this matter? Some of these platforms (notably Windows RT and iOS, but also some Android-based devices) will even refuse to run unsigned applications. Users are unable to write their own software and distribute it to others without agreeing to often onerous restrictions. Users with the misfortune of living in the wrong country may be forbidden from even that opportunity. The vendor may choose to block applications that compete with their own, reducing innovation. The ability to explore and tinker with the components of the system is restricted, making it harder for users to learn how modern operating systems work. If I own a perfectly functional phone that no longer receives vendor updates, I don't even have the option of paying a third party to ensure that I can't be compromised by a malicious website and risk the loss of passwords or financial details. The user is directly harmed by these restrictions.

I won't argue that there are no benefits to curated software ecosystems. I won't even argue against devices shipping with a locked down policy by default. I will strongly argue that the owner of a device should not only have the freedom to choose whether they wish to remain within those locked-down boundaries, but should also have the freedom to impose their own boundaries. There should be no forced choice between freedom and security.

Those who argue against Secure Boot risk depriving us of the freedom to make a personal decision as to who we trust. Those who argue against Secure Boot while ignoring Restricted Boot risk depriving us of even more. The traditional PC market is decreasing in importance. Unless we do anything about it, free software will be limited to a niche group of enthusiasts who've carefully chosen from a small set of devices that respect user freedom. We should have been campaigning against Restricted Boot 10 years ago. Don't delay it even further by fighting against implementations that already respect user freedom.
The problem with Samsung laptops bricking themselves turned out to be down to the UEFI variable store becoming more than 50% full and Samsung's firmware being dreadful, but the trigger was us writing a crash dump to the nvram. I ended up using this feature to help someone get a backtrace from a kernel oops during suspend today, and realised that it's not been terribly well publicised, so.

First, make sure pstore is mounted. If you're on 3.9 then do:

mount -t pstore /sys/fs/pstore /sys/fs/pstore

For earlier kernels you'll need to find somewhere else to stick it. If there's anything in there, delete it - we want to make sure there's enough space to save future dumps. Now reboot twice[1]. Next time you get a system crash that doesn't make it to system logs, mount pstore again and (with luck) there'll be a bunch of files there. For tedious reasons these need to be assembled in reverse order (part 12 comes before part 11, and so on) but you should have a crash log. Report that, delete the files again and marvel at the benefits that technology has brought to your life.

[1] UEFI implementations generally handle variable deletion by flagging the space as reclaimable rather than immediately making it available again. You need to reboot in order for the firmware to garbage collect it. Some firmware seems to require two reboot cycles to do this properly. Thanks, firmware.
It's fairly straightforward to boot a UEFI Secure Boot system using something like Shim or the Linux Foundation's loader, and for distributions using either the LF loader or the generic version of Shim that's pretty much all you need to care about. The physically-present end user has had to explicitly install new keys or hashes, and that means that you no longer need to care about Microsoft's security policies or (assuming there's no exploitable flaws in the bootloader itself) fear any kind of revocation.

But what about if you're a distribution that cares about booting without the user having to install keys? There's several reasons to want that (convenience for naive users, ability to netboot, that kind of thing), but it has the downside that your system can now be used as an attack vector against other operating systems. Do you care about that? It depends how you weigh the risks. First, someone would have to use your system to attack another. Second, Microsoft would have to care enough to revoke your signature. The first hasn't happened yet, so we have no real idea how likely the second is. However, it doesn't seem awfully unlikely that Microsoft would be willing to revoke a distribution signature if that distribution were being used to attack Windows.

How do you avoid that scenario? There's various bits of security work you need to do, but one of them is to require that all your kernel modules be signed. That's easy for the modules in the distribution, since you just sign them all before shipping them. But how about third party modules? There's three main options here:

  1. Don't support third party modules on Secure Boot systems
  2. Have the distribution sign the modules
  3. Have the vendor sign the modules

The first option is easy, but not likely to please users. Or hardware vendors. Not ideal.

The second option is irritating for a bunch of reasons, and a pretty significant one is license-related. If you sign a module, does that mean you're endorsing it in some way? Does signing the nvidia driver mean that you think there's no license concerns? Even ignoring that, how do you decide whose drivers to sign? We can probably assume that companies like AMD and nvidia are fairly reputable, but how about Honest John's Driver Emporium? Verifying someone's identity is astonishingly expensive to do a good job of yourself, and not hugely cheaper if you farm it out to a third party. It's also irritating for the driver vendor, who needs a separate signature for every distribution they support. So, while possible, this isn't an attractive solution.

The third option pushes the responsibility out to other people, and it's always nice to get other people to do work instead of you. The problem then is deciding whose keys you trust. You can push that off to the user, but it's not the friendliest solution. The alternative is to trust any keys that are signed with a trusted key. But what is a trusted key? Having the distribution sign keys just pushes us back to option (2) - you need to verify everyone's identity, and they need a separate signing key for every distribution they support. In an ideal world, there'd be a key that we already trust and which is owned by someone willing to sign things with it.

The good news is that such a key exists. The bad news is that it's owned by Microsoft.

The recent discussion on LKML was about a patchset that allowed the kernel to install new keys if they were inside a PE/COFF binary signed by a trusted key. It's worth emphasising that this patchset doesn't change the set of keys that the kernel trusts - the kernel trusts keys that are installed in your system firmware, so if your system firmware trusts the Microsoft key then your kernel already trusts the Microsoft key. The reasoning here is pretty straightforward. If your firmware trusts things signed by Microsoft, and if a bad person can get things signed by Microsoft, the bad person can already give you a package containing a backdoored bootloader. Letting them sign kernel modules doesn't alter the power they already have over your system. Microsoft will sign PE/COFF binaries, so a vendor would just have to sign up with Microsoft, pay $99 to Symantec to get their ID verified, wrap their key in a PE/COFF binary and then get it signed by Microsoft. The kernel would see that this object was signed by a trusted key and extract and install the key.

Linus is, to put it mildly, unenthusiastic about this idea. It adds some extra complexity to the kernel in the form of a binary parser that would only be used to load keys from userspace, and the kernel already has an interface for that in the form of X509 certificates. The problem we have is that Microsoft won't sign X509 certificates, and there's no way to turn a PE/COFF signature into an X509 signature. Someone would have to re-sign the keys, which starts getting us back to option (2). One way around this would be to have an automated service that accepts PE/COFF objects, verifies that they're signed by Microsoft, extracts the key, re-signs it with a new private key and spits out an X509 certificate. That avoids having to add any new code to the kernel, but it means that there would have to be someone to run that service and it means that their public key would have to be trusted by the kernel by default.

Who would that third party be? The logical choice might be the Linux Foundation, but since we have members of the Linux Foundation Technical Advisory Board saying that they think module signing is unnecessary and that there's no real risk of revocation, it doesn't seem likely that they'll be enthusiastic. A distribution could do it, but there'd be arguments about putting one distribution in a more privileged position than others. So far, nobody's stood up to do this.

A possible outcome is that the distributions who care about signed modules will all just carry this patchset anyway, and the ones who don't won't. That's probably going to be interpreted by many as giving too much responsibility to Microsoft, but it's worth emphasising that these patches change nothing in that respect - if your firmware trusts Microsoft, you already trust Microsoft. If your firmware doesn't trust Microsoft, these patches will not cause your kernel to trust Microsoft. If you've set up your own chain of trust instead, anything signed by Microsoft will be rejected.

What's next? It wouldn't surprise me too much if nothing happens until someone demonstrates how to use a signed Linux system to attack Windows. Microsoft's response to that will probably determine whether anyone ends up caring.
The UEFI bootloader that the Linux Foundation have been working on has just been released. That means we now have two signed bootloaders available - this one and shim.

Does this mean Linux distributions can now support Secure Boot?

They've actually been able to for a while. Ubuntu shipped with Secure Boot support last October, and Fedora shipped with Secure Boot support in January. Both used Shim rather than the Linux Foundation loader, and Shim's also being used by a variety of smaller distributions. The LF loader is a different solution to the same problem.

Is the Linux Foundation the preferred loader for distributions?

Probably not in most cases. One of the primary functional differences between Shim and the LF loader is that the LF loader is based around cryptographic hashes rather than signing keys. This means that the user has to explicitly add a hash to the list of permitted binaries whenever a distribution updates their bootloader or kernel. Doing that involves being physically present at the machine, so it's kind of a pain.

Why use it at all, then?

Being hash based means that you don't need to maintain any signing infrastructure. This means that distributions can support Secure Boot without having to change their build process at all. Shim already supports this use case (and some distributions are using it), but the LF loader has nicer UI for managing it.

Any other reasons?

Actually, yes. Shim implements Secure Boot loading in a less than entirely ideal way - it duplicates the firmware's entire binary loading, validation, relocation and execution code. This is necessary because the UEFI specification doesn't provide any mechanism for adding additional authentication mechanisms. The main downside of this is that the standard UEFI LoadImage() and StartImage() calls don't work under Shim. The LF loader hooks into the low-level security architecture and installs its own handlers, which means the standard UEFI interfaces work. The upshot is that you can use bootloaders like Gummiboot or efilinux without having to modify them to call out to Shim.

Why doesn't Shim do the same?

The UEFI architecture is slightly complicated. The UEFI specification itself defines the upper layers of the firmware, basically covering everything that UEFI applications and operating systems need. It doesn't define the lower layers of a UEFI implementation. Those are contained in the UEFI Platform Initialization spec, and that's what defines the security architecture interfaces that the LF loader hooks into. The problem is that it's completely valid to implement the UEFI specification without implementing the Platform Initialization specification, and if anyone does that then the LF loader will fail.

Can't you try both approaches?

Yes, and that's actually pretty much the plan now. I'm working on integrating the LF loader's UI and security code into Shim with the aim of producing one loader that'll satisfy the full set of use cases, and James is happy with this.

Which should I use?

Depends. If you want to support gummiboot (and aren't willing to patch it to call out to Shim), you'll need to use the LF loader. If you want to use key-based signing setups to avoid forcing re-enrolment on updates, you'll need to use Shim. If you're somewhere in the middle, you can probably use either. Once we've got the code merged, you won't have to make a choice.
I bricked a Samsung laptop today. Unlike most of the reported cases of Samsung laptops refusing to boot, I never booted Linux on it - all experimentation was performed under Windows. It seems that the bug we've been seeing is simultaneously simpler in some ways and more complicated in others than we'd previously realised.

So, some background. The original belief was that the samsung-laptop driver was doing something that caused the system to stop working. This driver was coded to a Samsung specification in order to support certain laptop features that weren't accessible via any standardised mechanism. It works by searching a specific area of memory for a Samsung-specific signature. If it finds it, it follows a pointer to a table that contains various magic values that need to be written in order to trigger some system management code that actually performs the requested change. This is unusual in this day and age, but not unique. The problem is that the magic signature is still present on UEFI systems, but attempting to use the data contained in the table causes problems.

We're not quite sure what those problems are yet. Originally we assumed that the magic values we wrote were causing the problem, so the samsung-laptop driver was patched to disable it on UEFI systems. Unfortunately, this doesn't actually fix the problem - it just avoids the easiest way of triggering it. It turns out that it wasn't the writes that caused the problem, it was what happened next. Performing the writes triggered a hardware error of some description. The Linux kernel caught and logged this. In the old days, people would often never see these logs - the system would then be frozen and it would be impossible to access the hard drive, so they never got written to disk. There's code in the kernel to make this easier on UEFI systems. Whenever a severe error is encountered, the kernel copies recent messages to the UEFI variable storage space. They're then available to userspace after a reboot, allowing more accurate diagnostics of what caused the crash.

That crash dump takes about 10K of UEFI storage space. Microsoft require that Windows 8 systems have at least 64K of storage space available. We only keep one crash dump - if the system crashes again it'll simply overwrite the existing one rather than creating another. This is all completely compatible with the UEFI specification, and Apple actually do something very similar on their hardware. Unfortunately, it turns out that some Samsung laptops will fail to boot if too much of the variable storage space is used. We don't know what "too much" is yet, but writing a bunch of variables from Windows is enough to trigger it. I put some sample code here - it writes out 36 variables each containing a kilobyte of random data. I ran this as an administrator under Windows and then rebooted the system. It never came back.

This is pretty obviously a firmware bug. Writing UEFI variables is expressly permitted by the specification, and there should never be a situation in which an OS can fill the variable store in such a way that the firmware refuses to boot the system. We've seen similar bugs in Intel's reference code in the past, but they were all fixed early last year. For now the safest thing to do is not to use UEFI on any Samsung laptops. Unfortunately, if you're using Windows, that'll require you to reinstall it from scratch.