[personal profile] mjg59
Getting on for seven years ago, I wrote an article on why the Linux kernel responds "False" to _OSI("Linux"). This week I discovered that vendors were making use of another behavioural difference between Linux and Windows to change the behaviour of their firmware and breaking things in the process.

The ACPI spec defines the _REV object as evaluating "to the revision of the ACPI Specification that the specified \_OS implements as a DWORD. Larger values are newer revisions of the ACPI specification", ie you reference _REV and you get back the version of the spec that the OS implements. Linux returns 5 for this, because Linux (broadly) implements ACPI 5.0, and Windows returns 2 because fuck you that's why[1].

(An aside: To be fair, Windows maybe has kind of an argument here because the spec explicitly says "The revision of the ACPI Specification that the specified \_OS implements" and all modern versions of Windows still claim to be Windows NT in \_OS and eh you can kind of make an argument that NT in the form of 2000 implemented ACPI 2.0 so handwave)

This would all be fine except firmware vendors appear to earnestly believe that they should ensure that their platforms work correctly with RHEL 5 even though there aren't any drivers for anything in their hardware and so are looking for ways to identify that they're on Linux so they can just randomly break various bits of functionality. I've now found two systems (an HP and a Dell) that check the value of _REV. The HP checks whether it's 3 or 5 and, if so, behaves like an old version of Windows and reports fewer backlight values and so on. The Dell checks whether it's 5 and, if so, leaves the sound hardware in a strange partially configured state.

And so, as a result, I've posted this patch which sets _REV to 2 on X86 systems because every single more subtle alternative leaves things in a state where vendors can just find another way to break things.

[1] Verified by hacking qemu's DSDT to make _REV calls at various points and dump the output to the debug console - I haven't found a single scenario where modern Windows returns something other than "2"

Grrrr

Date: 2015-03-12 10:47 am (UTC)
From: (Anonymous)
Thanks for your diagnosis (and your patch).
Makes me angry just reading it, I can just imagine how it would make you feel.

Date: 2015-03-12 01:10 pm (UTC)
From: [identity profile] m50d.wordpress.com
Are there any good use cases for this kind of thing? Hardware that works around known OS bugs, say? Were we all just much more naive the spec was written?

Date: 2015-03-12 04:27 pm (UTC)
From: (Anonymous)
NO NO NO NO NO NO NO NO NO

The firmware should NEVER try to work around OS bugs! If there's a bug in the OS, it will be fixed, ESPECIALLY if the OS is Linux! If the firmware works around the bug, then the OS bugfix turns that firmware workaround into a firmware bug.

That's not how it works in the real world

Date: 2015-03-12 09:37 pm (UTC)
From: (Anonymous)
FW vendors have to deal with OS bugs all the time. Let's say I'm a fW vendor and I have this super nice network card that doesn't work with RHEL or Windows or whatever, just because there's a bug in the OS. Well, Red Hat or Microsoft couldn't care less about it, it's MY problem and it's MY loss that I can't sell my shiny peripheral because the OS is broken, my company's going bankrupt and I'll loose my job. So in a world that FW vendors need to make things work, yeah, they pretty much need to implement workarounds over broken OS or broken HW (been here, done that).
And even if you say "but Linux is open-source!" and submit a patch (been there, done that) it takes quite a while from the moment you submit the patch to the moment the servers are upgraded with a newer kernel. A workaround fixes things *NOW* and makes that angry customer go away.

Re: That's not how it works in the real world

Date: 2015-03-13 01:58 am (UTC)
From: (Anonymous)
In what world do you control the BIOS firmware but have no ability to fix the OS? Good hardware vendors have Red Hat and SuSE on the phone and can tell them that there's a bug that needs fixing.

Also, go read the two observed instances Matthew pointed out; in both cases, the firmware randomly breaks things rather than fixing them.

It'd be lovely to see the changelogs where someone added those particular bits of insanity and what they thought they were doing.

Re: That's not how it works in the real world

Date: 2015-03-15 06:16 pm (UTC)
From: (Anonymous)
So submit a patch to mainline and release a manual driver for people to use until it get's merged.

Windows 10

Date: 2015-03-12 02:51 pm (UTC)
From: (Anonymous)
Have you checked that Windows 10 still returns _REV of 2? Making a change like this, you should definitely future proof yourself in some sense.

Re: Windows 10

Date: 2015-03-13 03:37 am (UTC)
From: (Anonymous)
I suppose it would be ironic if Microsoft did change the _Rev to 5 and anybody upgrading to that on old hardware was bitten by the "assume it's Linux" bugs.

Date: 2015-03-12 04:49 pm (UTC)
From: (Anonymous)
If it's any consolation, Microsoft had all the same kind of problems trying to get Win98-era firmware to work with Windows 2000 and XP.
-John
From: (Anonymous)
fart fart fart fart fart fart fart

Date: 2015-03-12 09:31 pm (UTC)
From: (Anonymous)
Linux really needs a way to get along with firmware designers. This hard stance just makes both all sides hate each other.

Take a step back and look from their perspective. Modern laptops will need to support both Windows 8.1, Windows 7 and if the vendor supports it, Linux. Some vendors like HP and Dell both seem to be trying very hard to find an equilibrium to support all these OS's from one piece of firmware.

I2C & SMBus touchpads can't work in Windows 7. Windows 7 can't support Connected Standby. Windows 7 doesn't work with modern audio solutions. Windows 8.1 supports all of this. The only way a firmware designer can support Windows 7 and Windows 8 on the same box is with a way to differentiate OS's via _OSI. That's their first priority.

You throw Linux into the mix and what happens when the audio vendor is only willing to support their audio solution in a Windows 7 type mode? Or what if you offer connected standby (thus not offering S3) in Windows 8.1. Is it actually appropriate to claim to support everything Windows 8.1 supports when you have a a realistic scenario like that? The answer isn't every component vendor needs to support every piece of hardware in the mode the latest version of Windows operates as in Linux. Sure that's a fine sounding idea in theory. Component vendors don't work that way though.

By the time someone gets something resembling Connected Standby working in Linux there will probably be something to replace it and the laptops that support connected standby when booted in Windows 8.1 will be due for replacement too, exacerbating this problem.

It may be counter intuitive but requiring the ideal steady state scenario you dream of is likely going to cause less laptops to fully function under Linux.

(c) happy medium

Date: 2015-03-16 03:21 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawnToeawzBfrU1YGJvj7ln2UGGpebU1xAkA
Working fully in the open during development isn't currently an option for system vendors. What we currently attempt to do is ask our IHV's to submit patches early when possible and after systems launch if they need to match on DMI information or particulars about the system that were previously private.

We work with vendors like Canonical and Redhat for our platform enablement and certification purposes. Canonical has been working on a firmware test suite for a while that we actively use for finding and fixing issues with the firmware with relation to Linux. What about if you supported DMI patches submitted from them specifically after they have validated the code path from _OSI of Linux on platforms that it matters? I don't think every platform would need this.

You're CC'ed on a thread on LKML about this from this morning, but the XPS 13 in particular this could have been very useful. The touchpad runs way better in I2C mode but I2S audio isn't yet mature. A fully supported _OSI of Windows 2013 would mean that it's forced to I2C mode touchpad and I2S mode audio. A fully supported _OSI of Windows 2009 would mean PS2 touchpad and HDA audio. At least until the I2S audio is mature it would be a better experience for users to have I2C touchpad and HDA audio. During platform development we could validate that particular code path for Linux and after the platform launches Canonical could submit a DMI matching patch indicating they've validated it with this codepath and we should support _OSI of Linux (or whatever pre-agreed value we pick).


Have you reached out to Microsoft to see if they'll be willing to share major differences and subsystems that have been implemented between OS versions? This sort of thing is NDA backed, so unfortunately it can't come from system vendors like Dell. Given how open source friendly Microsoft has been lately, you might have some more luck these days.

As the person above indicated though, you should look into getting connected standby support in the kernel. This doesn't affect the XPS 13, but there are platforms that will be needing to support connected standby when Windows 2013 _OSI is detected that will be on their way.

Re: (c) happy medium

Date: 2015-03-16 06:42 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawnToeawzBfrU1YGJvj7ln2UGGpebU1xAkA
With this specific case we were intending to issue another BIOS update later to remove the _REV check after I2S audio was mature in Linux. To us this means the patches to the kernel and userspace are upstream and included in the current stable release of all the major distros.

To me there are always going to be quirks. Even if we had talked about all this stuff sooner, we'd have a quirk in the kernel. It's not a trivial amount of effort to add support for some new technologies, especially when it's the responsibilities of our IHV's with other priorities.

Lets say there was a hypothetical scenario we had something like _OSI of Linux to get out the door in the modes we wanted. Canonical submits a patch to allow _OSI of Linux and in the patch documents exactly why _OSI of Linux needs to be enabled for this HW. When the things that they documented change and someone notices, that patch gets dropped. If no one notices, the hardware keeps working. At least it's a better result than us needing to issue a BIOS update to drop the firmware change for checking for _REV when things are stable in the kernel. Furthermore it matches the inflection of a particular kernel version that the software is actually supporting of the subsystem. For the XPS 13, I2S audio is sorta there for 4.0, so probably 4.1 it would have made sense to drop the quirk if the rest of it landed.

What stops us from doing this earlier? We don't tell people about our hardware until we're ready to sell it. It wasn't public knowledge that the new XPS 13 was coming after CES. Even if we did mention new HW was coming as a teaser, it wasn't public knowledge that it would have a Microsoft Precision touchpad or take advantage of a codec that could use multiple audio modes. Mentioning any of this (especially with a DMI information) could have tipped off the impending hardware.

We're fine being as open as possible after the launch. That's why I believe if you had a trusted party like RH or Canonical vetting these things that would support a separate _OSI during development that they could make sure it makes sense at the time and add the DMI patch at launch.

Re: (c) happy medium

Date: 2015-03-17 12:28 am (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawnToeawzBfrU1YGJvj7ln2UGGpebU1xAkA
We can't expect users to perform firmware updates just to get things working, and in most cases we can't expect system vendors to do the firmware updates in the first place - imagine this code being cut and paste into a low-end system with a 6-month support cycle, and then figure out the probability that anybody's ever going to fix it once Linux works properly.
That's a really unfair double standard. If there's a BIOS issue and we can fix it in firmware and actively do fix it why can't we tell people to go and use it? That can keep quirks out of the kernel! There was a problem with a bunch of the recent E series machines that we issued a BIOS fix for related to keyboard repeating specific to Linux. People had no problem applying that update.

Why? Windows makes very little use of them. Worse, they tend to end up breaking in surprising ways when the kernel changes behaviour. They're a huge maintenance overhead, and reducing the number present is a huge win for everybody.
There are plenty of quirks in Windows drivers, they're obfuscated though and not as obvious since we don't see the source.

And it gets argued about for 3 months because Canonical have historically been dreadful at actually explaining this kind of thing. This is why I think we need to come up with a good process for doing this. If we have a template that can be copied and pasted and filled out to be included in the git commit for example I think it would go a long way. All the major questions about it that normally come up can be put in the commit itself.

We don't need to know about specific hardware. Saying something like "Our expectations for operating systems that report Windows 2013 support include Microsoft Precision touchpad support, working I2S audio for existing codecs and connected standby" at some point last year would have told us nothing other than that Dell were actually paying attention to what would be involved in integrating new hardware features, which is hardly proprietary information. Knowing which features are likely to be required by real hardware vendors helps developers prioritize appropriately.
OK. When possible I'll try to notify you of the things I know about. Right now - the big ones are:
* Intel audio will support something different for Skylake than we have for Broadwell (I2S). Intel will need to comment more on this though as the information I have is under NDA.
* Windows 10 platforms will introduce Modern Standby.
* PCIe SSD's will become very important.


In this specific case, the audio issue is down to driver support rather than anything to do with our claimed operating system. Using _OSI("Linux") to indicate that a driver (rather than the core OS) is missing functionality is a pretty awful thing to do. It would be more meaningful to provide a mechanism for switching at runtime (a defined ACPI method that changes hardware configuration and triggers a PCI hotplug event, for example) and then have the Realtek driver call that when it detects that it's unable to drive the hardware in question.
You're absolutely right that this is something that for Linux the core OS doesn't really indicate the functionality, it's more of a driver type thing.
The problem is that the EC needs to set the mode when the HW is turned on, not at runtime. The values that are cached from a previous cold boot and those are what's used. We're in discussion of better ways to do this for upcoming platforms. I do like the idea of a driver being able to request switching the mode at least for the next boot. I'll raise it with the team.

Re: (c) happy medium

Date: 2015-03-19 06:39 pm (UTC)
From: [identity profile] https://www.google.com/accounts/o8/id?id=AItOawnol273JiF1qcD5Z0bdwRYNVrlPA6kpF-Y
Are we talking PCIe SSDs in the "Present as an AHCI controller" sense, or NVMe, or something more exciting? I think we're fairly on top of this one.
NVMe. I believe there are some features that weren't supported on this, but I don't know the specifics.

Is this more than what's in ACPI 6.0?
I'm not sure. This is something coming from Microsoft, i'm not privvy to the details of it. I just know it's coming.

Re: (c) happy medium

Date: 2015-03-19 07:16 pm (UTC)
From: [identity profile] yuhong.wordpress.com
I think if it truly have to be set at boot, I believe a BIOS option is the right solution. Windows still supports HDA perfectly fine, right?

xps 13 2015?

Date: 2015-03-12 10:16 pm (UTC)
From: (Anonymous)
hey,
if you are writing about the 2015 edition of the xps 13, are you aware of these bug-reports?

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1413446
https://bugzilla.redhat.com/show_bug.cgi?id=1188741
https://bugzilla.kernel.org/show_bug.cgi?id=93361

if you haven't seen it, there's a nice write-up about what's wrong with the device: https://major.io/2015/02/03/linux-support-dell-xps-13-9343-2015-model/

Is there a boot parameter to tweak the value?

Date: 2015-03-13 08:10 am (UTC)
From: (Anonymous)
First of all: thanks for your work. You're making our lives better.

Then -- I think the only way to go about this is having boot parameters. Choose sane defaults (and in this case, version 2 most probably is), but giving users easy ways to experiment and change things is (I think) always the best choice.

Of course, it'll always be possible to recompile the kernel, but why hang this hoop so high?1
From: (Anonymous)
If I want to find out if I have laptops/servers behaving like this what do I need to do? Is there a mechanical mostly naive way that disassembled ACPI can be inspected for things like this?

What is Drawing expecting?

Date: 2015-03-15 02:29 am (UTC)
From: (Anonymous)
We should check what Apple hardware and Darwin expect before changing this value or else we will break macbooks and the like.

Wrong tree

Date: 2015-03-15 04:06 am (UTC)
From: (Anonymous)
If the firmware is disabling and breaking things, shouldn’t the appropriate solution be to fix the firmware, instead of implementing yet another opaque workaround?
(reply from suspended user)

Might need to add ThinkPad t540p to the list

Date: 2015-11-05 09:46 pm (UTC)
From: (Anonymous)
So I have been fighting a weird sound problem in the Lenovo Thinkpad t540p with newer kernels where the sound is in a weird startup state which sounded exactly like the problem descibed here

00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
Subsystem: Lenovo Device 2210
Flags: bus master, fast devsel, latency 0, IRQ 33
Memory at e1630000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
Kernel driver in use: snd_hda_intel

The head-phones will sometimes go into mode and the only way to fix is drop the system reboot twice and it works again until the next reboot when it goes into mode again. Of course it could just be a bad hardware :) [though going to a 3.10 level kernel didn't seem to cause it.]

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Expand Cut Tags

No cut tags