Vendors continue to break things
Mar. 11th, 2015 11:37 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Getting on for seven years ago, I wrote an article on why the Linux kernel responds "False" to _OSI("Linux"). This week I discovered that vendors were making use of another behavioural difference between Linux and Windows to change the behaviour of their firmware and breaking things in the process.
The ACPI spec defines the _REV object as evaluating "to the revision of the ACPI Specification that the specified \_OS implements as a DWORD. Larger values are newer revisions of the ACPI specification", ie you reference _REV and you get back the version of the spec that the OS implements. Linux returns 5 for this, because Linux (broadly) implements ACPI 5.0, and Windows returns 2 because fuck you that's why[1].
(An aside: To be fair, Windows maybe has kind of an argument here because the spec explicitly says "The revision of the ACPI Specification that the specified \_OS implements" and all modern versions of Windows still claim to be Windows NT in \_OS and eh you can kind of make an argument that NT in the form of 2000 implemented ACPI 2.0 so handwave)
This would all be fine except firmware vendors appear to earnestly believe that they should ensure that their platforms work correctly with RHEL 5 even though there aren't any drivers for anything in their hardware and so are looking for ways to identify that they're on Linux so they can just randomly break various bits of functionality. I've now found two systems (an HP and a Dell) that check the value of _REV. The HP checks whether it's 3 or 5 and, if so, behaves like an old version of Windows and reports fewer backlight values and so on. The Dell checks whether it's 5 and, if so, leaves the sound hardware in a strange partially configured state.
And so, as a result, I've posted this patch which sets _REV to 2 on X86 systems because every single more subtle alternative leaves things in a state where vendors can just find another way to break things.
[1] Verified by hacking qemu's DSDT to make _REV calls at various points and dump the output to the debug console - I haven't found a single scenario where modern Windows returns something other than "2"
The ACPI spec defines the _REV object as evaluating "to the revision of the ACPI Specification that the specified \_OS implements as a DWORD. Larger values are newer revisions of the ACPI specification", ie you reference _REV and you get back the version of the spec that the OS implements. Linux returns 5 for this, because Linux (broadly) implements ACPI 5.0, and Windows returns 2 because fuck you that's why[1].
(An aside: To be fair, Windows maybe has kind of an argument here because the spec explicitly says "The revision of the ACPI Specification that the specified \_OS implements" and all modern versions of Windows still claim to be Windows NT in \_OS and eh you can kind of make an argument that NT in the form of 2000 implemented ACPI 2.0 so handwave)
This would all be fine except firmware vendors appear to earnestly believe that they should ensure that their platforms work correctly with RHEL 5 even though there aren't any drivers for anything in their hardware and so are looking for ways to identify that they're on Linux so they can just randomly break various bits of functionality. I've now found two systems (an HP and a Dell) that check the value of _REV. The HP checks whether it's 3 or 5 and, if so, behaves like an old version of Windows and reports fewer backlight values and so on. The Dell checks whether it's 5 and, if so, leaves the sound hardware in a strange partially configured state.
And so, as a result, I've posted this patch which sets _REV to 2 on X86 systems because every single more subtle alternative leaves things in a state where vendors can just find another way to break things.
[1] Verified by hacking qemu's DSDT to make _REV calls at various points and dump the output to the debug console - I haven't found a single scenario where modern Windows returns something other than "2"
Grrrr
Date: 2015-03-12 10:47 am (UTC)Makes me angry just reading it, I can just imagine how it would make you feel.
no subject
Date: 2015-03-12 01:10 pm (UTC)no subject
Date: 2015-03-12 04:27 pm (UTC)The firmware should NEVER try to work around OS bugs! If there's a bug in the OS, it will be fixed, ESPECIALLY if the OS is Linux! If the firmware works around the bug, then the OS bugfix turns that firmware workaround into a firmware bug.
That's not how it works in the real world
Date: 2015-03-12 09:37 pm (UTC)And even if you say "but Linux is open-source!" and submit a patch (been there, done that) it takes quite a while from the moment you submit the patch to the moment the servers are upgraded with a newer kernel. A workaround fixes things *NOW* and makes that angry customer go away.
Re: That's not how it works in the real world
Date: 2015-03-13 01:58 am (UTC)Also, go read the two observed instances Matthew pointed out; in both cases, the firmware randomly breaks things rather than fixing them.
It'd be lovely to see the changelogs where someone added those particular bits of insanity and what they thought they were doing.
Re: That's not how it works in the real world
Date: 2015-03-15 06:16 pm (UTC)Windows 10
Date: 2015-03-12 02:51 pm (UTC)Re: Windows 10
Date: 2015-03-13 03:37 am (UTC)Re: Windows 10
Date: 2015-03-16 07:50 am (UTC)no subject
Date: 2015-03-12 04:49 pm (UTC)-John
On working with Microsoft and Linux systems at the same time
Date: 2015-03-12 07:54 pm (UTC)no subject
Date: 2015-03-12 09:31 pm (UTC)Take a step back and look from their perspective. Modern laptops will need to support both Windows 8.1, Windows 7 and if the vendor supports it, Linux. Some vendors like HP and Dell both seem to be trying very hard to find an equilibrium to support all these OS's from one piece of firmware.
I2C & SMBus touchpads can't work in Windows 7. Windows 7 can't support Connected Standby. Windows 7 doesn't work with modern audio solutions. Windows 8.1 supports all of this. The only way a firmware designer can support Windows 7 and Windows 8 on the same box is with a way to differentiate OS's via _OSI. That's their first priority.
You throw Linux into the mix and what happens when the audio vendor is only willing to support their audio solution in a Windows 7 type mode? Or what if you offer connected standby (thus not offering S3) in Windows 8.1. Is it actually appropriate to claim to support everything Windows 8.1 supports when you have a a realistic scenario like that? The answer isn't every component vendor needs to support every piece of hardware in the mode the latest version of Windows operates as in Linux. Sure that's a fine sounding idea in theory. Component vendors don't work that way though.
By the time someone gets something resembling Connected Standby working in Linux there will probably be something to replace it and the laptops that support connected standby when booted in Windows 8.1 will be due for replacement too, exacerbating this problem.
It may be counter intuitive but requiring the ideal steady state scenario you dream of is likely going to cause less laptops to fully function under Linux.
no subject
Date: 2015-03-16 07:57 am (UTC)a) Claim compatibility with the latest version of Windows and risk things being broken because we don't implement every feature that Windows implements, or
b) Advertise that we're Linux and risk things being broken because we don't define what "Linux" is for the reasons discussed in the post I referenced
(a) means trying to be exactly compatible with Windows, which means doing things like ensuring that _REV returns the same value. (b) is unworkable. The third option of figuring out what we need to implement before advertising something requires that system vendors be as willing to work in the open as we are.
(c) happy medium
Date: 2015-03-16 03:21 pm (UTC)We work with vendors like Canonical and Redhat for our platform enablement and certification purposes. Canonical has been working on a firmware test suite for a while that we actively use for finding and fixing issues with the firmware with relation to Linux. What about if you supported DMI patches submitted from them specifically after they have validated the code path from _OSI of Linux on platforms that it matters? I don't think every platform would need this.
You're CC'ed on a thread on LKML about this from this morning, but the XPS 13 in particular this could have been very useful. The touchpad runs way better in I2C mode but I2S audio isn't yet mature. A fully supported _OSI of Windows 2013 would mean that it's forced to I2C mode touchpad and I2S mode audio. A fully supported _OSI of Windows 2009 would mean PS2 touchpad and HDA audio. At least until the I2S audio is mature it would be a better experience for users to have I2C touchpad and HDA audio. During platform development we could validate that particular code path for Linux and after the platform launches Canonical could submit a DMI matching patch indicating they've validated it with this codepath and we should support _OSI of Linux (or whatever pre-agreed value we pick).
Have you reached out to Microsoft to see if they'll be willing to share major differences and subsystems that have been implemented between OS versions? This sort of thing is NDA backed, so unfortunately it can't come from system vendors like Dell. Given how open source friendly Microsoft has been lately, you might have some more luck these days.
As the person above indicated though, you should look into getting connected standby support in the kernel. This doesn't affect the XPS 13, but there are platforms that will be needing to support connected standby when Windows 2013 _OSI is detected that will be on their way.
Re: (c) happy medium
Date: 2015-03-16 04:41 pm (UTC)Well, that's basically the problem. _OSI("Linux") doesn't mean anything. Right now you'd like to interpret it as "Doesn't support I2S", but what do you want to happen once Linux implements I2S properly? Someone to remember to remove that DMI check? In this specific case that might end up happening, but in general we'd end up with an unscalable set of quirks and a bunch of hardware not running at its full potential. It's not an option.
You've pretty clearly explained the issues that you face with the current state of affairs. What stopped you from being able to do so earlier? What's the earliest you would have been able to explain them?
Re: (c) happy medium
Date: 2015-03-16 06:42 pm (UTC)To me there are always going to be quirks. Even if we had talked about all this stuff sooner, we'd have a quirk in the kernel. It's not a trivial amount of effort to add support for some new technologies, especially when it's the responsibilities of our IHV's with other priorities.
Lets say there was a hypothetical scenario we had something like _OSI of Linux to get out the door in the modes we wanted. Canonical submits a patch to allow _OSI of Linux and in the patch documents exactly why _OSI of Linux needs to be enabled for this HW. When the things that they documented change and someone notices, that patch gets dropped. If no one notices, the hardware keeps working. At least it's a better result than us needing to issue a BIOS update to drop the firmware change for checking for _REV when things are stable in the kernel. Furthermore it matches the inflection of a particular kernel version that the software is actually supporting of the subsystem. For the XPS 13, I2S audio is sorta there for 4.0, so probably 4.1 it would have made sense to drop the quirk if the rest of it landed.
What stops us from doing this earlier? We don't tell people about our hardware until we're ready to sell it. It wasn't public knowledge that the new XPS 13 was coming after CES. Even if we did mention new HW was coming as a teaser, it wasn't public knowledge that it would have a Microsoft Precision touchpad or take advantage of a codec that could use multiple audio modes. Mentioning any of this (especially with a DMI information) could have tipped off the impending hardware.
We're fine being as open as possible after the launch. That's why I believe if you had a trusted party like RH or Canonical vetting these things that would support a separate _OSI during development that they could make sure it makes sense at the time and add the DMI patch at launch.
Re: (c) happy medium
Date: 2015-03-16 06:57 pm (UTC)We can't expect users to perform firmware updates just to get things working, and in most cases we can't expect system vendors to do the firmware updates in the first place - imagine this code being cut and paste into a low-end system with a 6-month support cycle, and then figure out the probability that anybody's ever going to fix it once Linux works properly.
Why? Windows makes very little use of them. Worse, they tend to end up breaking in surprising ways when the kernel changes behaviour. They're a huge maintenance overhead, and reducing the number present is a huge win for everybody.
And it gets argued about for 3 months because Canonical have historically been dreadful at actually explaining this kind of thing.
We don't need to know about specific hardware. Saying something like "Our expectations for operating systems that report Windows 2013 support include Microsoft Precision touchpad support, working I2S audio for existing codecs and connected standby" at some point last year would have told us nothing other than that Dell were actually paying attention to what would be involved in integrating new hardware features, which is hardly proprietary information. Knowing which features are likely to be required by real hardware vendors helps developers prioritise appropriately.
In this specific case, the audio issue is down to driver support rather than anything to do with our claimed operating system. Using _OSI("Linux") to indicate that a driver (rather than the core OS) is missing functionality is a pretty awful thing to do. It would be more meaningful to provide a mechanism for switching at runtime (a defined ACPI method that changes hardware configuration and triggers a PCI hotplug event, for example) and then have the Realtek driver call that when it detects that it's unable to drive the hardware in question.
Re: (c) happy medium
Date: 2015-03-17 12:28 am (UTC)That's a really unfair double standard. If there's a BIOS issue and we can fix it in firmware and actively do fix it why can't we tell people to go and use it? That can keep quirks out of the kernel! There was a problem with a bunch of the recent E series machines that we issued a BIOS fix for related to keyboard repeating specific to Linux. People had no problem applying that update.
There are plenty of quirks in Windows drivers, they're obfuscated though and not as obvious since we don't see the source.
This is why I think we need to come up with a good process for doing this. If we have a template that can be copied and pasted and filled out to be included in the git commit for example I think it would go a long way. All the major questions about it that normally come up can be put in the commit itself.
OK. When possible I'll try to notify you of the things I know about. Right now - the big ones are:
* Intel audio will support something different for Skylake than we have for Broadwell (I2S). Intel will need to comment more on this though as the information I have is under NDA.
* Windows 10 platforms will introduce Modern Standby.
* PCIe SSD's will become very important.
You're absolutely right that this is something that for Linux the core OS doesn't really indicate the functionality, it's more of a driver type thing.
The problem is that the EC needs to set the mode when the HW is turned on, not at runtime. The values that are cached from a previous cold boot and those are what's used. We're in discussion of better ways to do this for upcoming platforms. I do like the idea of a driver being able to request switching the mode at least for the next boot. I'll raise it with the team.
Re: (c) happy medium
Date: 2015-03-17 12:40 am (UTC)Some people. Others probably just assumed Linux was broken and went back to Windows, or just put up with it forever. Firmware updates are pretty much the preserve of technical users, even more so than Linux.
I've spent a bunch of time looking, and I really haven't found much evidence that this is true. A lot of hardware works with shrinkwrapped Windows media, even if it was shipped later.
Cool. I'll see what I can find out about that.
Is this more than what's in ACPI 6.0?
Are we talking PCIe SSDs in the "Present as an AHCI controller" sense, or NVMe, or something more exciting? I think we're fairly on top of this one.
Re: (c) happy medium
Date: 2015-03-19 06:39 pm (UTC)NVMe. I believe there are some features that weren't supported on this, but I don't know the specifics.
I'm not sure. This is something coming from Microsoft, i'm not privvy to the details of it. I just know it's coming.
Re: (c) happy medium
Date: 2015-03-19 06:48 pm (UTC)Some of the NVMe stuff has been blocked on the spec being published. The patches should hit LKML once the next UEFI version is released.
Re: (c) happy medium
Date: 2015-03-19 07:16 pm (UTC)Re: (c) happy medium
Date: 2015-03-16 04:47 pm (UTC)Re: (c) happy medium
Date: 2015-03-16 06:43 pm (UTC)xps 13 2015?
Date: 2015-03-12 10:16 pm (UTC)if you are writing about the 2015 edition of the xps 13, are you aware of these bug-reports?
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413446
https://bugzilla.redhat.com/show_bug.cgi?id=1188741
https://bugzilla.kernel.org/show_bug.cgi?id=93361
if you haven't seen it, there's a nice write-up about what's wrong with the device: https://major.io/2015/02/03/linux-support-dell-xps-13-9343-2015-model/
Is there a boot parameter to tweak the value?
Date: 2015-03-13 08:10 am (UTC)Then -- I think the only way to go about this is having boot parameters. Choose sane defaults (and in this case, version 2 most probably is), but giving users easy ways to experiment and change things is (I think) always the best choice.
Of course, it'll always be possible to recompile the kernel, but why hang this hoop so high?1
How do less technical users repeat your exploration
Date: 2015-03-14 06:58 am (UTC)Re: How do less technical users repeat your exploration
Date: 2015-03-16 08:00 am (UTC)What is Drawing expecting?
Date: 2015-03-15 02:29 am (UTC)Re: What is Drawing expecting?
Date: 2015-03-16 08:00 am (UTC)Wrong tree
Date: 2015-03-15 04:06 am (UTC)Re: Wrong tree
Date: 2015-03-16 08:01 am (UTC)Might need to add ThinkPad t540p to the list
Date: 2015-11-05 09:46 pm (UTC)00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
Subsystem: Lenovo Device 2210
Flags: bus master, fast devsel, latency 0, IRQ 33
Memory at e1630000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Kernel driver in use: snd_hda_intel
The head-phones will sometimes go into mode and the only way to fix is drop the system reboot twice and it works again until the next reboot when it goes into mode again. Of course it could just be a bad hardware :) [though going to a 3.10 level kernel didn't seem to cause it.]