Why ACPI?

Oct. 31st, 2023 11:30 pm
[personal profile] mjg59
"Why does ACPI exist" - - the greatest thread in the history of forums, locked by a moderator after 12,239 pages of heated debate, wait no let me start again.

Why does ACPI exist? In the beforetimes power management on x86 was done by jumping to an opaque BIOS entry point and hoping it would do the right thing. It frequently didn't. We called this Advanced Power Management (Advanced because before this power management involved custom drivers for every machine and everyone agreed that this was a bad idea), and it involved the firmware having to save and restore the state of every piece of hardware in the system. This meant that assumptions about hardware configuration were baked into the firmware - failed to program your graphics card exactly the way the BIOS expected? Hurrah! It's only saved and restored a subset of the state that you configured and now potential data corruption for you. The developers of ACPI made the reasonable decision that, well, maybe since the OS was the one setting state in the first place, the OS should restore it.

So far so good. But some state is fundamentally device specific, at a level that the OS generally ignores. How should this state be managed? One way to do that would be to have the OS know about the device specific details. Unfortunately that means you can't ship the computer without having OS support for it, which means having OS support for every device (exactly what we'd got away from with APM). This, uh, was not an option the PC industry seriously considered. The alternative is that you ship something that abstracts the details of the specific hardware and makes that abstraction available to the OS. This is what ACPI does, and it's also what things like Device Tree do. Both provide static information about how the platform is configured, which can then be consumed by the OS and avoid needing device-specific drivers or configuration to be built-in.

The main distinction between Device Tree and ACPI is that Device Tree is purely a description of the hardware that exists, and so still requires the OS to know what's possible - if you add a new type of power controller, for instance, you need to add a driver for that to the OS before you can express that via Device Tree. ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware. So, for instance, ACPI allows you to associate a device with a function to power down that device. That function may, when executed, trigger a bunch of register accesses to a piece of hardware otherwise not exposed to the OS, and that hardware may then cut the power rail to the device to power it down entirely. And that can be done without the OS having to know anything about the control hardware.

How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that - if the OS tried to touch the hardware at the same time APM did, boom, almost impossible to debug failures (This is why various hardware monitoring drivers refuse to load by default on Linux - the firmware declares that it's going to touch those registers itself, so Linux decides not to in order to avoid race conditions and potential hardware damage. In many cases the firmware offers a collaborative interface to obtain the same data, and a driver can be written to get that. this bug comment discusses this for a specific board)

Unfortunately ACPI doesn't entirely remove opaque firmware from the equation - ACPI methods can still trigger System Management Mode, which is basically a fancy way to say "Your computer stops running your OS, does something else for a while, and you have no idea what". This has all the same issues that APM did, in that if the hardware isn't in exactly the state the firmware expects, bad things can happen. While historically there were a bunch of ACPI-related issues because the spec didn't define every single possible scenario and also there was no conformance suite (eg, should the interpreter be multi-threaded? Not defined by spec, but influences whether a specific implementation will work or not!), these days overall compatibility is pretty solid and the vast majority of systems work just fine - but we do still have some issues that are largely associated with System Management Mode.

One example is a recent Lenovo one, where the firmware appears to try to poke the NVME drive on resume. There's some indication that this is intended to deal with transparently unlocking self-encrypting drives on resume, but it seems to do so without taking IOMMU configuration into account and so things explode. It's kind of understandable why a vendor would implement something like this, but it's also kind of understandable that doing so without OS cooperation may end badly.

This isn't something that ACPI enabled - in the absence of ACPI firmware vendors would just be doing this unilaterally with even less OS involvement and we'd probably have even more of these issues. Ideally we'd "simply" have hardware that didn't support transitioning back to opaque code, but we don't (ARM has basically the same issue with TrustZone). In the absence of the ideal world, by and large ACPI has been a net improvement in Linux compatibility on x86 systems. It certainly didn't remove the "Everything is Windows" mentality that many vendors have, but it meant we largely only needed to ensure that Linux behaved the same way as Windows in a finite number of ways (ie, the behaviour of the ACPI interpreter) rather than in every single hardware driver, and so the chances that a new machine will work out of the box are much greater than they were in the pre-ACPI period.

There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?

It's understandable why ACPI has a poor reputation. But it's also hard to figure out what would work better in the real world. We could have built something similar on top of Open Firmware instead but the distinction wouldn't be terribly meaningful - we'd just have Forth instead of the ACPI bytecode language. Longing for a non-ACPI world without presenting something that's better and actually stands a reasonable chance of adoption doesn't make the world a better place.

Date: 2023-11-18 07:24 am (UTC)
From: (Anonymous)
Thank you for providing detailed information about Thermal Management. I downloaded ACPI version 2 spec and latest version 6.5 spec; I didn't read every single word but have made some observations for you.

ACPI Version 6.5 has section: 11.5 Native OS Device Driver Thermal Interfaces
This section didn't appear in version 2.
Maybe the light bulb has finally turned on...... Operating System should be using direct hardware interface instead of dubious convoluted "ACPI Machine Language".

You appear to claim ACPI is "extensible". However there is simple proof that isn't true. In ACPI version 2 section 12.3 Thermal Objects, it lists 13 different types of objects. ACPI version 6.5 section 11.4 Thermal Objects lists 26 different types of objects. Clearly, ACPI version 2 was NOT "extensible"; they had to make major changes to it in later versions. If ACPI was truly "extensible", an Operating System written in 2002 to version 2 of spec would still have 100% compatibility with ACPI today. That is not the case so clearly the "extensible" claim is false.

Now, for writing a not bloated Thermal Management specifiction...

In ideal world, the Operating System has native drivers for all hardware (e.g. a GPU). The OS driver knows how to read the GPU temperature sensor. The OS driver knows the temp the GPU will turn on it's fan. The OS driver knows critical temp for the GPU where it will shutdown automatically. It's not the job of Thermal Management specification to provide this information.

I would expect a GPU operates in lower power mode by default, and only goes into high power mode when Operating System drivers do necessary magic.
This should be the same for ALL devices. They should operate in lower power mode until the Operating System activates higher performance modes.
PCI has a generic power management specification, so I believe even "unknown" PCI devices can be put in low power mode.

For x86-64 CPUs, they should have a standard method for reading the temperature sensor. They should have standard Model Specific Register to inform the OS what temperature the fan will be required, and what temp will force hardware shutdown. They should have standard MSRs describing the power states S0 - S3. They should have standard hardware method for putting CPU into different power states. Let me guess, Intel and AMD do it differently.....

Section 11.1 of ACPI 6.5 spec has nice diagram of a "Thermal Zone", it shows what is required for thermal management.

"Thermal Zone-wide active cooling device" would be something like cpu or case fan, connected directly to northbridge chipset. One problem is I don't think there is standard hardware interface for this. Instead of bloated ACPI, a standard hardware interface for "motherboard fans" should be developed. All chipsets should follow this standard.

"Thermal Zone-wide temperature sensor" is similar to above. Instead of bloated ACPI, a standard hardware interface for "motherboard temperature sensors" should be developed.


The final requirement of a "not bloated" Thermal Management specification is specify how components interact e.g. where they are located in respect to each other.

So the spec does require a list of Thermal Zones (probably usually 1). Each zone has a list of Devices (I think _TZD in ACPI lingo). For each device in the list, it has x,y,z coordinates to specify its location relative to everything else. As stated above, the Operating System has native driver to understand the device details, or OS has generic (PCI etc) driver to put unknown device into low power mode.

Voila.... thermal management done with less bloat than ACPI. And without using ACPI Machine Language.

Repeating myself.... ACPI is bloated highy complicated specification (e.g. AML) that causes problems. The only excuse for ACPI is a lack of hardware standardisation, consequently forcing a very generic indirect specification.

Any "praise" of ACPI is misguided.

vehement disagreement

Date: 2024-06-14 04:35 am (UTC)
From: (Anonymous)
Not only do I disagree with the sentiment that `any "praise" of ACPI is misguided`, I'm going so far as to actively inculcate it's use in RISC-V. Put that in your pipe and smoke it.

Profile

Matthew Garrett

About Matthew

Power management, mobile and firmware developer on Linux. Security developer at Aurora. Ex-biologist. [personal profile] mjg59 on Twitter. Content here should not be interpreted as the opinion of my employer. Also on Mastodon.

Page Summary

Expand Cut Tags

No cut tags