Matthew Garrett ([personal profile] mjg59) wrote2016-04-13 12:46 pm
Entry tags:

Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

Re: EFI updates

[personal profile] gourdcaptain 2016-04-14 09:19 am (UTC)(link)
Ah then that's probably not going to fly with the Lenovo ones. (In my defender for buying it, finding good 11 inch laptops these days is hard- netbooks have mostly died out in favor of tablets and such. Plus, I needed Skylake for hardware HEVC decoding since nothing in the relatively cheap laptop range is going to have a discrete card that can do that, and 1080p HEVC takes a fair chunk of CPU to decode.)

Re: "Long term reliability"

(Anonymous) 2016-04-14 09:22 am (UTC)(link)
Probably. It seems that Skylake suffers from electromigration (https://en.wikipedia.org/wiki/Electromigration), basically, it is a process of loss and gain of metal atoms in the circuit, and it would cause disconnections or shorts of the "wires" inside the integrated circuit.

Re: SATA PM Patches

(Anonymous) 2016-04-14 10:23 am (UTC)(link)
I thought that current RST Windows drivers do enable LPM by default. They certainly must have a long black-/whitelist though.
I guess the goal should be to do the same on Linux. There is already ATA_HORKAGE_NOLPM in libata-core.c to mark drives with broken LPM, so if you know such drives that aren't in ata_device_blacklist in libata-core.c yet, please report and/or send a patch!

Am I in trouble?

(Anonymous) 2016-04-14 02:39 pm (UTC)(link)
I have a T460s here (NVMe Version). It is running Ubuntu 16.04 (kernel Version 4.4.0-18). When running powertop, it says in Package column C2 (pc2) 98.2%, all the others values in this column are Zero. Does this mean I am in trouble?

Re: 4.6-rc2

[personal profile] edmonds 2016-04-14 03:48 pm (UTC)(link)
Interesting. An earlier commenter running 4.5 posted lspci output (http://pastebin.com/vBu5pBq6) too. Their SSD controller looks like this:

3c:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01) (prog-if 02 [NVM Express])
[...]
        DevSta: CorrErr+ UncorrErr- FatalErr+ UnsuppReq+ AuxPwr+ TransPend-
        LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <4us, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
[...]


While your controller looks like this:

3c:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01) (prog-if 02 [NVM Express])
[...]

        DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
        LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <4us, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
[...]


Note "ASPM L1 Enabled" vs "ASPM Disabled". There are also some other differences. Is your XPS 13 also a 9350?

Haswell too! (with WiFi adapter enabled)

(Anonymous) 2016-04-14 03:55 pm (UTC)(link)
Hi Matthew,
Same thing happens on older microarchitectures too. My laptop (Acer C720) is able to reach pc7 if I 'rmmod ath9k'. With ath9k loaded it is limited to pc3 (but cc7). That happens even if I enable power savings on the ath9k module (
which seems to cause full system crashes from time to time)(!).

Thankfully the battery life is still pretty good (7~10 hours).

Re: 4.6-rc2

(Anonymous) 2016-04-14 04:28 pm (UTC)(link)
Yes it is a 9350 as well.

Interesting though, device and firmware details:
phil  ~  cat /sys/class/nvme/nvme0/model
PM951 NVMe SAMSUNG 512GB
 phil  ~  cat /sys/class/nvme/nvme0/firmware_rev
BXV77D0Q

Maybe they have passed an additional kernel parameter to enable ASPM?
Apparently it doesn't play nicely at the moment.

Phil

(Anonymous) 2016-04-14 05:03 pm (UTC)(link)
To be clear, does this also applies to the Thinkpad P50 machine, which has Intel Skylake HQ and Xeon-E3 CPUs. Also, the machine has a double-fan cooling system which Linux core may not be able to control the cooling system as good as Windows to my knowledge. Does this issue has anything to do with the cooling system at all? Thanks!

No issue here

(Anonymous) 2016-04-14 05:49 pm (UTC)(link)
I can't see this on Lenovo T460s, i5-6200u, running Debian testing, kernel 4.4.0.1-amd64

powertop: http://pastebin.com/3ybgW9Sn
lspci: http://pastebin.com/Nym5peHz

Thinkpad T460s could go PC10 after microcode update

(Anonymous) 2016-04-14 06:11 pm (UTC)(link)
Hello! I've updated firmware on my Thinkpad T460s, it has update for CPU microcode. Now the version of ucode is 0x84? and powertop/turbostat shows 70% in PC6 and could go even to PC10 with switched-off display

Re: No issue here

(Anonymous) 2016-04-14 06:55 pm (UTC)(link)
Here you go: http://pastebin.com/tZwpPnU3

Does this only affect laptops in sleep state?

(Anonymous) 2016-04-14 07:17 pm (UTC)(link)
I bought an HP Pavilion 15 with Intel 6200U (Skylake) processor a few weeks ago and dual-booted Windows 10 with Arch Linux. I almost never put my laptop to sleep and prefer shutting it down when it is not needed instead. PowerTOP tells me that I'm in pc2 state. Is it safe to use Linux (I use my laptop for around 6-8 hours a day), or should I stick with Windows until this is fixed?

Re: Thinkpad T460s could go PC10 after microcode update

(Anonymous) 2016-04-14 07:19 pm (UTC)(link)
No, I was wrong - it seems that it was one of USB devices I've disabled in the BIOS which prevents upper pC-states. When I enabled all of them - still PC3 max. But when I've disabled Smartcard Reader, SD-cardread, fingerprint reader - I can get to PC10 with switched-off screen...

Re: EFI updates

[personal profile] mikeymop 2016-04-14 07:27 pm (UTC)(link)
On the XPS 13, how are you dropping into the efi shell to execute these?
I always imagined they'd have windows software that would hamper this, even if it can run .exe

can you write/share a relevant tutorial for this process, as I will need to do this until Dell publishes the files for fwupdate.

Re: EFI updates

[personal profile] mikeymop 2016-04-14 07:31 pm (UTC)(link)
I found what he's talking about. It's on the Arch wiki for the curious

http://hgdev.co/install-bios-update-under-linux-on-the-dell-xps-13-9343-2015/

I'll have to try this on my 9350 when I get it tomorrow.
Does anyone know how I can check the microcode version?

Re: EFI updates

(Anonymous) 2016-04-14 07:46 pm (UTC)(link)
9350 has updates at LVFS that can be applied as capsules.
https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa

Re: EFI updates

(Anonymous) 2016-04-14 07:47 pm (UTC)(link)
in this generation:
xps 9350: https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa
precision 5510: https://secure-lvfs.rhcloud.com/lvfs/device/124c207d-5db8-4d95-bd31-34fd971b34f9

Otherwise put the .EXE from support.dell.com on a FAT32 USB key or on the ESP and select flash BIOS from the F12 POST menu.

Page 2 of 6