Matthew Garrett ([personal profile] mjg59) wrote2016-04-13 12:46 pm
Entry tags:

Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

4.6 seems to have better support

(Anonymous) 2016-05-30 10:01 pm (UTC)(link)
I have a Thinkpad X1 Carbon (4th gen). I'm running debian sid with 4.5 kernel. It seems that I couldn't get past PC2. I've updated my kernel to the latest in experimental which is 4.6 and I'm reaching PC8.

C2 (pc2) 12.2%
C3 (pc3) 0.2%
C6 (pc6) 39.4%
C7 (pc7) 0.0%
C8 (pc8) 14.1%
C9 (pc9) 0.0%
C10 (pc10) 0.0%

I've also flashed the BIOS to the latest version:

BIOS Revision: 1.14 Firmware Revision: 1.9

microcode revision=0x88.

I'm not sure if the BIOS/firwmare/microcode upgrade made any real difference or if it was just the kernel because it did reach PC8 with just the kernel upgrade.

Hopefully this will improve the battery life a bit.

Re: Acer VN7-592g

(Anonymous) 2016-06-04 07:56 pm (UTC)(link)
Hi, I have exatly the same laptop as you and I am trying to decently run some linux distribution from months.
Currently I am running arch linux with gnome shell, and I am not as expert as you guys so could you just tell me what do you mean by deeper PC states?
I use bumblebee, with the official Nvidia drivers and the the discrete card always off (with bbswitch) laptop-mode-tools, the thermald deamon, and powertop for diagnosis, but even if I really improved my linux experience during the months, I am pretty far to obtain an experiece comparable with windows 10 where the laptop temperatures are always under 39 degrees and the battery lasts as you mentioned 40% more.
Can you just tell me those important things you found out with your experiece? Why you say that you need to use the nouveau and not the Nvidia officials? can you explain me better how to activate the sleep 1 option? what about the r8168?
Thank you in advance.
Simone

Re: 4.6 seems to have better support

(Anonymous) 2016-06-09 06:55 pm (UTC)(link)
IT WAS THE IMEI FIRMWARE!

So I Think I was experiencing the same thing. I'm still running 4.4.0-23 *(ubuntu kernel). I was only able to get to pc2. I ended up booting into windows to debug a separate unrelated issue, and ran a few updates. I had already been running the 1.14 bios and firmware revision 1.9 so that wasn't the solution. I did upgrade the IMEI firmware though. After rebooting back into linux I'm now getting much better behavior.

C2 (pc2) 12.6%
C3 (pc3) 0.1%
C6 (pc6) 55.4%
C7 (pc7) 0.0%
C8 (pc8) 14.8%
C9 (pc9) 0.0%
C10 (pc10) 0.0%

It's worth noting that I am using an NVMe drive so the sata patches here don't apply.

Re: NVMe problems

(Anonymous) 2016-06-10 07:03 pm (UTC)(link)
IT WAS THE IMEI FIRMWARE!

So I Think I was experiencing the same thing. I have an x1 carbon 4th gen with Skylake and an NVME drive.

I'm still running 4.4.0-23 *(ubuntu kernel). I was only able to get to pc2. I ended up booting into windows to debug a separate unrelated issue, and ran a few updates. I had already been running the 1.14 bios and firmware revision 1.9 so that wasn't the solution. I did upgrade the IMEI firmware though. After rebooting back into linux I'm now getting much better behavior.

C2 (pc2) 12.6%
C3 (pc3) 0.1%
C6 (pc6) 55.4%
C7 (pc7) 0.0%
C8 (pc8) 14.8%
C9 (pc9) 0.0%
C10 (pc10) 0.0%

Re: 4.6 seems to have better support

(Anonymous) 2016-06-22 05:26 am (UTC)(link)
Sorry if this question has an obvious answer, but how can one upgrade the IMEI firmware?

Re: 4.6 seems to have better support

(Anonymous) 2016-06-29 10:33 pm (UTC)(link)
Same question here.

Have you tried the newest Kernel 4.6.3 or 4.7-rc*? Would the performance be better?

Re: 4.6 seems to have better support

(Anonymous) 2016-07-14 10:10 pm (UTC)(link)
I still get up to PC2 on my P50 Lenovo laptop with kernel 4.6... Any trick there?

average battery life on T460s

(Anonymous) 2016-07-17 02:04 pm (UTC)(link)
How long lasts your battery (on your T460s) after performing the mentioned optimizations ?

Re: 4.6 seems to have better support

(Anonymous) 2016-07-17 09:01 pm (UTC)(link)
There's no need to, it's a kernel module.
It would only make sense if the linux version is behind the windows version. And only if it made persistent changes to the cpu/efi/whatever...
https://www.thomas-krenn.com/de/wiki/Intel_Management_Engine_Interface_mei_Linux_Treiber

Kernel 4.7 seems to do PC8 OK

(Anonymous) 2016-07-26 12:29 am (UTC)(link)
Another data point, FWIW: vanilla kernel 4.7 seems to go to PC8 OK. I didn't see PC8 on 4.5 or 4.6.

Dell XPS 13 9350, 1.4.0 BIOS, Intel WiFi. The only change from kernel defaults is i915.enable_rc6=1.

Idle-ish system with normal processes running, screen on:

C2 (pc2) 18.5%
C3 (pc3) 0.3%
C6 (pc6) 2.5%
C7 (pc7) 0.0%
C8 (pc8) 36.6%
C9 (pc9) 0.0%
C10 (pc10) 0.0%

If I quit Firefox then PC8 goes up to around 50%.

Full PowerTOP bits: https://gist.github.com/projectgus/e79923530392517c4e55064bb07b778d

(On the offchance anyone sees this and decides to buy an XPS 13 on this basis - it's still been a trip back to running Linux 10-15 years ago. I get random hard locks coming out of suspend, screen flickers when the laptop gets hot, glitchy touchpad driver, USB Type C is WIP, etc. Have spent way too much time compiling kernels and trawling forums. I wish I'd done my homework beforehand instead of thinking "well, Dell ships Linux on it from factory. How bad can it be?")

Re: Kernel 4.7 seems to do PC8 OK

(Anonymous) 2016-07-27 12:16 pm (UTC)(link)
Although there is nothing specific for Skylake, perhaps this will help (in kernel 4.8) ?
https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.8-Power-Management-ACPI

Re: My mobile part is seeing pc8

(Anonymous) 2016-08-11 01:19 am (UTC)(link)
I have the same Chip i.e : http://ark.intel.com/products/89608/Intel-Xeon-Processor-E3-1505M-v5-8M-Cache-2_80-GHz - which seems to be the same architecture as the 6th Gen U and Y processors. (Despite Matthews edit, these do count amongst those effected)

I am currently running a 4.7 kernel on ubuntu 16.10 with some tlp tweaks. So whatever patches were in 4.7 they don't solve the issues for skylake.

P-states sits in c2 or c3 disabling ntel pstates and using acpi with conservative governor I have seen it in c7. But this means you don't get the boost states/higher frequency clocks. i.e 8k - 2.81k hz only when using ACPI without pstates module. The machine runs cool and well when using the acpi module rather than the pstates. i.e with the following kernel line:

pcie_aspm=force intel_pstate=disable drm.vblankoffdelay=1 i915.semaphores=1 i915_enable_rc6=1 i915_enable_fbc=1


I am using HP Zbook studio g3 with latest bios N82 Ver. 01.13 from the first of August. NVME and Sata drives.

I've had some weird issues include sound disappearing from Windows randomly, and squeeling beeps during post after a hard reset due to crash.

lshw for those interested :

aenertia@hurapaki:~$ sudo lshw
[sudo] password for aenertia:
hurapaki
description: Notebook
product: HP ZBook Studio G3 (M6V81AV)
vendor: HP
serial: #
width: 64 bits
capabilities: smbios-2.7 dmi-2.7 vsyscall32
configuration: administrator_password=disabled boot=normal chassis=notebook family=103C_5336AN frontpanel_password=disabled keyboard_password=disabled power-on_password=disabled sku=M6V81AV uuid=#
*-core
description: Motherboard
product: 80D4
vendor: HP
physical id: 0
version: KBC Version 11.62

Re: Acer VN7-592g

(Anonymous) 2016-08-18 08:43 am (UTC)(link)
Connected with your diagnosis, using my Dell XPS 15 9550 (32GB, 512 GB nvme SSD) and kernel 4.7.0-generic, if I *use* the Intel GPU and have the nvidia driver installed (361.42) then powertop 2.8 reports PC2/PC3 only. If I *use* the Intel GPU and have nouveau installed (1:1.0.12) then powertop shows PC2-PC8 (45%ish time in PC8). Between the two the baseline power estimate drops from 25W (with nvidia installed) to 16W (with nouveau installed) - and to be clear in both cases I'm not using the nvidia chip, just the Intel GPU.

I have some more notes here: https://ubuntuforums.org/showthread.php?t=2301071&page=32&p=13470668#post13470668

Using 4.4.0 with the nvidia driver (over the last month) I saw PC2-PC8. I didn't understand why this got worse when upgrading to 4.7.0 until a colleague suggested the driver swap (cheers Kyran!).

Update?

(Anonymous) 2016-09-14 11:18 pm (UTC)(link)
Any update since you first installed? There have been a few kernel releases since then. I was going to install Fedora 24 or Debian, thoughts?

Interesting news?

(Anonymous) 2016-09-18 08:11 pm (UTC)(link)
Just bought a ThinkPad E460 (i5-6200U, regular SATA drive): enabling everything in the "Tunables" list of powertop, I get up to C10 for all cores, but PC7 only for the package. I'm on Arch, and I've also tried the latest linux-git kernel (4.8.0-rc6).

BTW, just have a look at table 4-4, page 70 of [1]: having a FHD screen that does NOT support PSR ([drm:intel_psr_enable] PSR not supported by this panel in kernel log), without turning off the screen, my max package state is actually PC8!

So, any thoughs to level up from PC7 to PC8?


[1] http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/6th-gen-core-family-mobile-u-y-processor-lines-datasheet-vol-1.pdf

Re: Thinkpad T460s could go PC10 after microcode update

(Anonymous) 2016-09-21 01:32 pm (UTC)(link)
Sorry I am new in this. How do you know/test that you reach PC6 or PC10 power states?

Re: Interesting news?

(Anonymous) 2016-09-23 07:21 pm (UTC)(link)
I've found that some of devices (either smartcard reader or cardreader) prevents my T460s eneter PC8. You could disable every unneeded device in BIOS and check if it'll help. Also, you'll probably need to enable ALPM on your SATA controller

Re: Interesting news?

(Anonymous) 2016-09-24 07:55 pm (UTC)(link)
Well, I *knew* that some devices could be the cause, but I actually *forgot* to check my card reader LOL
...and of course, disabling it in the BIOS solves the problem: thank you! :)
Now, I don't actually use the card reader every day, but it would be nice to have it enabled in some sort of power saving mode...an lspci shows this:
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
Neither forcing ASPM on powersave mode nor echoing "auto" in /sys/bus/pci/devices/0000:02:00.0/power/control seems to solve the issue...maybe it's that "Unassigned class" showed in the lspci output the problem?

Cheers,
"e460_owner"

Lenovo Yoga 900

(Anonymous) 2016-10-09 11:04 am (UTC)(link)
Currently at kernel 4.8.1, there is still no difference for the Lenovo Yoga 900.

Version:
➜  ~ uname -a
Linux V 4.8.1-040801-generic #201610071031 SMP Fri Oct 7 14:34:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Output of powertop:
          Package   |             Core    |            CPU 0       CPU 2
C2 (pc2)   33.2%    |                     |
C3 (pc3)    0.0%    | C3 (cc3)    0.9%    | C3-SKL      1.0%    0.3 ms  1.0%    0.4 ms
C6 (pc6)    0.0%    | C6 (cc6)    6.0%    | C6-SKL      8.2%    0.6 ms  4.5%    0.7 ms
C7 (pc7)    0.0%    | C7 (cc7)   35.0%    | C7s-SKL     0.1%    1.1 ms  0.1%    1.6 ms
C8 (pc8)    0.0%    |                     | C8-SKL     23.8%    1.7 ms 22.9%    3.2 ms
C9 (pc9)    0.0%    |                     | C9-SKL      0.0%    2.5 ms  0.0%    0.7 ms
C10 (pc10)  0.0%    |                     | C10-SKL    17.4%    3.9 ms 54.8%   10.6 ms


The bits [2:0] are set to 1, so should BIOS does not deny access to enter PC states. The MSR value is not at fault.
➜  ~ sudo rdmsr 0xE2
1e008006


Driver in use: intel_idle
➜  ~ grep . /sys/devices/system/cpu/cpuidle/*
/sys/devices/system/cpu/cpuidle/current_driver:intel_idle
/sys/devices/system/cpu/cpuidle/current_governor_ro:menu


➜  ~ dmesg | grep -i error
[    3.283699] EXT4-fs (sda9): re-mounted. Opts: errors=remount-ro
[    3.487043] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-24.ucode failed with error -2
[    3.487054] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-23.ucode failed with error -2
[    3.487145] iwlwifi 0000:01:00.0: Direct firmware load for iwlwifi-8000C-22.ucode failed with error -2
[    3.852964] i2c_hid i2c-ITE8396:00: error in i2c_hid_init_report size:19 / ret_size:18


Bug opened and hijacked: https://bugzilla.kernel.org/show_bug.cgi?id=116591

Duplicate bug opened: https://bugzilla.kernel.org/show_bug.cgi?id=116671. Just refers to https://github.com/mjg59/linux/tree/sata-lpm-firmware.

I think I've to conclude that nobody seems to be working on it.

Re: Lenovo Yoga 900

(Anonymous) 2016-10-09 11:19 am (UTC)(link)
I stand corrected

   Bad           VM writeback timeout                                                                                   
   Bad           Enable SATA link power management for host0
   Bad           Enable SATA link power management for host1
   Bad           Enable SATA link power management for host2
   Bad           Enable Audio codec power management
   Bad           NMI watchdog should be turned off
   Bad           Runtime PM for I2C Adapter i2c-1 (i915 gmbus dpb)
   Bad           Runtime PM for I2C Adapter i2c-2 (i915 gmbus dpd)
   Bad           Runtime PM for I2C Adapter i2c-0 (i915 gmbus dpc)
   Bad           Autosuspend for unknown USB device 1-7 (8087:0a2b)
   Bad           Runtime PM for PCI Device Intel Corporation Sky Lake Integrated Graphics
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP HD Audio
   Bad           Runtime PM for PCI Device Intel Corporation Sky Lake Host Bridge/DRAM Registers
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP Thermal subsystem
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP SMBus
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP LPC Controller
   Bad           Runtime PM for PCI Device O2 Micro, Inc. Device 8620
   Bad           Runtime PM for PCI Device Intel Corporation Skylake Processor Thermal Subsystem
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP CSME HECI
   Bad           Runtime PM for PCI Device Intel Corporation Wireless 8260
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP PMC
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]


Changing to:

   Good          VM writeback timeout
   Good          Enable SATA link power management for host0
   Good          Enable SATA link power management for host1
   Good          Enable SATA link power management for host2
   Good          Enable Audio codec power management
   Good          NMI watchdog should be turned off
   Good          Runtime PM for I2C Adapter i2c-1 (i915 gmbus dpb)                                                      
   Good          Runtime PM for I2C Adapter i2c-2 (i915 gmbus dpd)
   Good          Runtime PM for I2C Adapter i2c-0 (i915 gmbus dpc)
   Bad           Autosuspend for unknown USB device 1-7 (8087:0a2b)
   Good          Runtime PM for PCI Device Intel Corporation Sky Lake Integrated Graphics
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP HD Audio
   Bad           Runtime PM for PCI Device Intel Corporation Sky Lake Host Bridge/DRAM Registers
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP Thermal subsystem
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP SMBus
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP LPC Controller
   Bad           Runtime PM for PCI Device O2 Micro, Inc. Device 8620
   Bad           Runtime PM for PCI Device Intel Corporation Skylake Processor Thermal Subsystem
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP CSME HECI
   Bad           Runtime PM for PCI Device Intel Corporation Wireless 8260
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP PMC
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]


Allows the chip (peripherals) to spend time in lower power states than pc2:

          Package   |             Core    |            CPU 0       CPU 2
                    |                     | C0 active   1.7%        1.2%
                    |                     | POLL        0.5%    2.6 ms  0.0%    0.0 ms
                    |                     | C1E-SKL     4.3%    2.0 ms  0.3%    0.3 ms
C2 (pc2)   11.5%    |                     |
C3 (pc3)    0.3%    | C3 (cc3)    0.1%    | C3-SKL      0.1%    0.1 ms  0.1%    0.2 ms
C6 (pc6)    2.4%    | C6 (cc6)    1.9%    | C6-SKL      2.0%    0.5 ms  3.8%    1.9 ms
C7 (pc7)    0.1%    | C7 (cc7)   37.4%    | C7s-SKL     0.0%    0.0 ms  0.0%    0.0 ms
C8 (pc8)   20.7%    |                     | C8-SKL      9.4%    1.3 ms 11.6%    5.1 ms
C9 (pc9)    0.0%    |                     | C9-SKL      0.0%    0.0 ms  0.0%    0.0 ms
C10 (pc10)  0.0%    |                     | C10-SKL    30.0%   12.6 ms 80.8%    9.8 ms


Setting more flags to "Good" does not allow my laptop to resume after closing the lid. And I've to restart the network-manager, but I can live with that.

(Anonymous) 2016-10-20 02:30 pm (UTC)(link)
> These patches never went upstream

Does this mean that using Linux without these patches on Haswell or Broadwell laptop is equally bad as using it on Skylake laptop?

Skylake PC8 or PC10?

[identity profile] http://openid-provider.appspot.com/e.tomell 2016-11-05 03:41 pm (UTC)(link)
Hello,
I own a Dell XPS 15 9550, I hit sometimes PC8, but it's unreliable, I actually thought it never got below PC3 (with F25 beta).
In the second paragraph you wrote:
"The deepest power saving state I can get into is PC3, despite Skylake supporting PC8"
Where does this come from? I've checked the datasheet you linked, on page 66 (paragraph 4.2.5) there is this: "The processor supports C0, C1/C1E, C3, C6, C7, C8, C9 and C10 power states". Is badly worded, it should say PC{0..10}, on page 69 there even is a description of Package C10 State. On page 70 there is a nice table, PC10 can be reached if PSR is enabled, otherwise only PC8 is supported.
I checked then the intel datasheet for H-series mobile processors [1] (like the i7-6700hq I have), the relevant pages are 73-76 (but the table was omitted), but it looks the same. I'm also having an email exchange with another xps 15 9550 linux user, he wrote to me that his system does reach down to PC10, so now I'm trying to get to the bottom of this (I think he's running F24 or 25).
[1] http://www.intel.com/content/www/us/en/processors/core/6th-gen-core-family-mobile-h-processor-lines-datasheet-vol-1.html

Re: Skylake PC8 or PC10?

(Anonymous) 2016-12-13 10:42 pm (UTC)(link)
My Thinkpad P50 can only reach PC3 occasionally on Kernel 4.8.12 with Ubuntu 16.04. Most of the time it's on PC2. I guess the distribution of Linux also plays a role in some detailed configurations?

HP zbook studio g3

(Anonymous) 2017-08-08 05:09 pm (UTC)(link)
i7-6820hq, nvidia quadro m1000m

I'm booting with: iwlwifi.d0i3_disable=0 iwlwifi.uapsd_disable=0 iwlwifi.power_save=1 snd_hda_intel.power_save=1 e1000e.SmartPowerDownEnable=1 pcie_aspm=force pcie_aspm.policy=powersupersave

* Having ethernet plugged in (e1000e) makes it only get to pc2

* Having the discrete gpu powered off with bbswitch while xorg starts freezes the machine

* Having the discrete bpu powered off with bbswitch while loading the proprietary nvidia driver freezes the machine

* nouveau seems to freeze the machine no matter what bbswitch state

* Using bbswitch to power off the gpu it only gets pc3, loading the proprietary nvidia driver instead causes it to get into a lower pc-state (pc8? iirc), but more total power usage

* switching the powertop tunable "Runtime PM for PCI Device NVIDIA Corporation GM107GLM [Quadro M1000M]" makes the nvidia driver fail to load until next reboot (even if you switch it back)

Re: HP zbook studio g3

(Anonymous) 2017-08-08 05:13 pm (UTC)(link)
forgot to mention you need to start an instance of xorg with the nvidia driver before it gets past pc3, but you can quit it after starting it

Page 5 of 6