[personal profile] mjg59
Most x86 devices export various bits of system information via SMBIOS, including the system manufacturer, model and firmware version. This makes it possible for the kernel to alter its behaviour depending on the machine it's running on, usually referred to as "DMI quirking". This is a very attractive approach to handling machine-specific bugs - unfortunately it also means that we often end up working around symptoms with no understanding of the underlying issue.

Almost all x86 hardware is tested with Windows. For the most part vendors don't ship devices that don't work - if you install a stock copy of Windows on a system, you expect it to boot successfully and reboot properly. Basic ACPI functionality should be present and correct, including processor power-saving states. Time should pass at something approximating the real rate. And since stock Windows isn't updated with large numbers of DMI quirk entries, the hardware needs to do this with an operating system that's already shipped.

Which means that if Linux doesn't provide the same level of functionality, it means we're doing something different to Windows. Sometimes this is because we're doing something fundamentally different with the hardware - the HP NX6125, for example, resets its thermal trip points if the timer is set up in a different way to Windows. In this case we've decided that the additional functionality of doing things the Linux way is worth it, and we'll just blacklist the small set of machines that are broken by it.

But other times there's no need for the difference. For years we were triggering system reboots in a different way to Windows and then adding DMI workarounds for any systems that didn't work. The problem with this approach is that it's basically impossible to guarantee that you've found the full set of broken hardware. It's very easy to add another DMI quirk, but it doesn't solve the problem for anyone who bought a machine, tried Linux and gave up when they found reboot didn't work. More recently we've added a pile of quirks for Dells - turns out that in all cases we're working around a bug in their firmware that hangs if we're using VT-d.

We typically don't merge code that fixes a specific example of a problem without at least considering whether there's a wider class of similar problems that could all be solved at once. We should take the same attitude to DMI quirks. Sometimes they're the least harmful way of handling an issue, but most of the time they're just fixing one person's itch and leaving a larger number of people to continue cursing at Linux. There should at least be a demonstration of an attempt to understand the underlying problem before just adding another quirk, and every patch that permits the removal of some DMI checks should be greeted with great cheer.

Reminds me of piix-smbus

Date: 2012-07-06 03:50 pm (UTC)
From: [identity profile] benanov.livejournal.com
The old issue in the IBM ThinkPads where that module did something that corrupted the firmware of the SMBus, I believe.

The module disables itself it it sees an IBM system. I suppose that's an appropriate DMI quirk. :)

Tests?

Date: 2012-07-06 09:53 pm (UTC)
From: (Anonymous)
Is there a test suite that tests various functionality of the SMBIOS/ACPI/EFI/etc that can catch some of this stuff? Hopefully such a test suite could be updated every time we come across a new bug.

Even if bugs are being worked around by making Linux look like Windows, having such a test suite helps BIOS manufacturers find these bugs, and for such bugs as the timer setup you mention, it would be far better if manufacturers would be able to see these bugs and fix them. While I'm aware that most BIOS manufacturers don't really care, and having a test suite would just appear to be just more work for them, hopefully at least some will care enough.

However, and IMHO more importantly it also means end users can run the test suite on various manufacturers bioses and report back the results in a consistent format that makes reporting easier. Users can test a machine before they buy it for bugs that are likely to impact their experience and therefore put pressure on bios manufacturers that have problems that we do care about. Having automated reporting would allow for tracking of when various bios bugs are no longer frequently seen and the work around can possibly be dropped. Users can compare the relative quality of two different BIOSes (At the moment users have very little visibility into except the quality and quantity of options in the "Setup" screen).

Like the ACID tests did for browsers, a test suite that end users can run that tells them what's actually going on under the hood and makes it clear when various things are broken, or not performing optimally on a machine would be quite welcome by end users. Another example of the willingness of users to compare "3D marks" and other benchmark results shows that users are interested in trying to compare different manufacturers and discuss their results openly if they can easily produce results for their hardware.

Re: Tests?

Date: 2012-07-08 06:52 am (UTC)
From: [identity profile] yuhongbao.blogspot.com
I am thinking that a single standard group that connect the PC vendors and the OS vendors (including Microsoft) together would be a good idea, and could develop the test suites.

Re: Tests?

Date: 2012-07-09 02:03 am (UTC)
From: (Anonymous)
Then he is right. The test suite is Windows.

Re: Tests?

Date: 2012-07-10 06:24 pm (UTC)
From: (Anonymous)
If you're going to posit that the test suite is windows, then linux must restrict itself to doing only things windows does, or else linux is broken.

I don't like that approach, personally, so I don't consider windows a canonical test suite.

Windows Drivers!

Date: 2012-07-07 07:07 am (UTC)
From: (Anonymous)
> And since stock Windows isn't updated with large numbers of DMI quirk entries, the hardware needs to do this with an operating system that's already shipped.

Not sure here. Last time I used Windows the first thing to do after inserting new hardware was inserting a driver CD from the vendor and installing a driver or even downloading the latest driver from their website first.

I think that drivers in Windows contain a lot of device specific quirks. Even if it's a standard device for which generic drivers exist (like USB storage). If you tried to run Windows with only the drivers written by Microsoft, none of your hardware would work properly.

Re: Windows Drivers!

Date: 2012-07-12 11:01 am (UTC)
From: (Anonymous)
Since Vista/7, Microsoft have been pushing a lot harder for hardware which wants to wear shiny little "Works with Windows 7" badges to just use generic class drivers where possible. Microsoft don't much like random proprietary third-party kernel code any more than Linux developers do. (Unfortunately not even they have the muscle to get nVidia and ATi/AMD in line.)

Crucially for this kind of thing, "motherboard drivers" are basically unnecessary to get the machine to behave.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Nebula. Ex-biologist. @mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer.

Page Summary

Expand Cut Tags

No cut tags