Matthew Garrett ([personal profile] mjg59) wrote2016-04-13 12:46 pm
Entry tags:

Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

NVMe problems

(Anonymous) 2016-04-13 08:57 pm (UTC)(link)
The Dell XPS 13 had some problems related to NVMe preventing it from entering lower power states.

Relately, the NVMe SSD used in quite a few laptops these days has notorious power consumption [1]. If I purchased an XPS 13 I'd likely swap back to a non-NVMe SSD for this reason alone.

[1] http://www.silentpcreview.com/files/images/samsung-950pro/power.gif

[personal profile] edmonds 2016-04-13 09:32 pm (UTC)(link)
By working reliably under 4.5, do you mean you don't have to use any of the i915 module parameter workarounds like enable_rc6=0 ?

(Anonymous) 2016-04-13 10:06 pm (UTC)(link)
Heck Bay Trail is still a mess for support too.

Nasty issues with Linux 4.2, Skylake and NVMe

(Anonymous) 2016-04-13 10:06 pm (UTC)(link)
Until Linux Mint 18 will be released I thought it would be OK to go with the current release - 17.3. It's OK most of the time, until it breaks in various ways.

Mainboard: ASUS Z170-DELUXE
CPU: Intel Core i3-6320 3.90 GHz (Skylake)
SSD: Samsung 950 Pro 256GB M.2
Linux kernel: 4.2.x

Trying kernel 4.4 (available from Ubuntu) renders an unbootable system due to some strange issues configuring GRUB I think - I just didn't have the time to investigate in detail, although I tried chrooting and updating grub from a live session, without success.

I might as well consider another distro if kernel 4.5 or newer fixes these issues. Sometimes the system freezes so badly not even the (hardware) reset button doesn't work. Other times the NVMe controller simply throws in the towel [1] and I get left with some partially running programs, everything running from RAM, because the storage disappears until I reset the PC.

It's been two and a half terrible months since I got this new PC and lacking the time to find out answers it's frustrating I have no idea who to blame. So I guess I might as well blame myself for going with "the latest and greatest" from Intel without properly researching compatibility.

[1] http://pastebin.com/2djDSh3m

SATA PM Patches

(Anonymous) 2016-04-13 10:09 pm (UTC)(link)
I've read your blog post about SATA PM when it was fresh, saw your patches on LKML and thought: well, everything is on it's way. Also mentioned Panel Self Refresh and friends for i915 are slowly getting in mainline, so I thought we might get to a point, where power comsumption would get in a good state and that i should spend some time optimizing my Haswell notebook again. I tried but mostly gave up and thought I needed to wait some more time. Now I just learned your SATA patches were never merged, which makes me sad about pm in Linux again :(

I'm just a user, no expert at all, but If there is a possibility to give mainlining that patches another shot, i would be absolutely thankful.

Keep up the good work Mathew, it's really appreciated. You solved a lot of problems for us Linux users! :)

Regards!
Wilken Haase
parttime happy linux user

My mobile part is seeing pc8

(Anonymous) 2016-04-13 10:42 pm (UTC)(link)
This is a Dell Precision 5510
model name : Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz

          Package   |             Core    |            CPU 0       CPU 4
                    |                     | C0 active   2.2%        0.2%
                    |                     | POLL        0.0%    0.0 ms  0.0%    0.0 ms
                    |                     | C1E-SKL     0.4%    0.3 ms  6.4%    2.9 ms
C2 (pc2)   39.7%    |                     |
C3 (pc3)    1.0%    | C3 (cc3)    0.2%    | C3-SKL      0.5%    0.2 ms  0.0%    0.0 ms
C6 (pc6)    8.7%    | C6 (cc6)   17.4%    | C6-SKL     12.2%    0.9 ms  8.6%   20.1 ms
C7 (pc7)    0.0%    | C7 (cc7)   64.2%    | C7s-SKL     0.0%    0.0 ms  0.0%    0.0 ms
C8 (pc8)   21.7%    |                     | C8-SKL     65.6%    1.8 ms  4.1%    2.7 ms
C9 (pc9)    0.0%    |                     | C9-SKL      0.0%    0.0 ms  0.0%    0.0 ms
C10 (pc10)  0.0%    |                     | C10-SKL    13.6%    6.3 ms 78.5%   25.2 ms

Firmware?

(Anonymous) 2016-04-13 11:17 pm (UTC)(link)
What firmware are you on? IIRC anything before 1.1.7 incorrectly initialized the PCIe links, which broke ASPM and therefore prevented any deep sleep state from being entered.

NVMe power saving is still unimplemented on Linux, but I might get around to that soon if no one beats me.

Long term reliability

(Anonymous) 2016-04-14 03:55 am (UTC)(link)
The "Caution: Long term reliability cannot be assured [...]" message is also present in the 4th (https://www-ssl.intel.com/content/www/us/en/processors/core/4th-gen-core-family-mobile-u-y-processor-lines-vol-1-datasheet.html) and 5th (https://www-ssl.intel.com/content/www/us/en/processors/core/5th-gen-core-family-datasheet-vol-1.html) generation mobile (Haswell and Broadwell) datasheets.

How do the macbooks deal with this?

(Anonymous) 2016-04-14 04:21 am (UTC)(link)
They supposedly release source to their (BSD based) kernel... do they leave out the interesting parts like this? If not, maybe it's worth checking.

not only Skylake..

(Anonymous) 2016-04-14 06:33 am (UTC)(link)
there's also a horrible bug in/for Bay Trail, crashing systems left and right: https://bugzilla.kernel.org/show_bug.cgi?id=109051

"Long term reliability"

(Anonymous) 2016-04-14 08:07 am (UTC)(link)
I'm not quite sure what this means, and I'm a bit scared of what it could mean. Could somebody please clarify this for me:

"Long term reliability cannot be assured unless all the Low-Power Idle States are enabled."

Does it mean hardware life? Does it imply the processor will degrade/wear if Low-Power Idle States are not enabled?

4.6-rc2

(Anonymous) 2016-04-14 08:14 am (UTC)(link)
I'm running 4.6-rc2 on Fedora 23 and it seems to have the GPU in RC6 state for a significant portion of time:

http://pastebin.com/f42EpWV2

No real problems here otherwise, can't seem to determine what my NVMe SSD power state is:
http://pastebin.com/uYWiHAAg

Am I in trouble?

(Anonymous) 2016-04-14 02:39 pm (UTC)(link)
I have a T460s here (NVMe Version). It is running Ubuntu 16.04 (kernel Version 4.4.0-18). When running powertop, it says in Package column C2 (pc2) 98.2%, all the others values in this column are Zero. Does this mean I am in trouble?

Haswell too! (with WiFi adapter enabled)

(Anonymous) 2016-04-14 03:55 pm (UTC)(link)
Hi Matthew,
Same thing happens on older microarchitectures too. My laptop (Acer C720) is able to reach pc7 if I 'rmmod ath9k'. With ath9k loaded it is limited to pc3 (but cc7). That happens even if I enable power savings on the ath9k module (
which seems to cause full system crashes from time to time)(!).

Thankfully the battery life is still pretty good (7~10 hours).

(Anonymous) 2016-04-14 05:03 pm (UTC)(link)
To be clear, does this also applies to the Thinkpad P50 machine, which has Intel Skylake HQ and Xeon-E3 CPUs. Also, the machine has a double-fan cooling system which Linux core may not be able to control the cooling system as good as Windows to my knowledge. Does this issue has anything to do with the cooling system at all? Thanks!

No issue here

(Anonymous) 2016-04-14 05:49 pm (UTC)(link)
I can't see this on Lenovo T460s, i5-6200u, running Debian testing, kernel 4.4.0.1-amd64

powertop: http://pastebin.com/3ybgW9Sn
lspci: http://pastebin.com/Nym5peHz

Thinkpad T460s could go PC10 after microcode update

(Anonymous) 2016-04-14 06:11 pm (UTC)(link)
Hello! I've updated firmware on my Thinkpad T460s, it has update for CPU microcode. Now the version of ucode is 0x84? and powertop/turbostat shows 70% in PC6 and could go even to PC10 with switched-off display

Does this only affect laptops in sleep state?

(Anonymous) 2016-04-14 07:17 pm (UTC)(link)
I bought an HP Pavilion 15 with Intel 6200U (Skylake) processor a few weeks ago and dual-booted Windows 10 with Arch Linux. I almost never put my laptop to sleep and prefer shutting it down when it is not needed instead. PowerTOP tells me that I'm in pc2 state. Is it safe to use Linux (I use my laptop for around 6-8 hours a day), or should I stick with Windows until this is fixed?

Skylake w/ ucode 0x84 + kernel 4.6rc3

(Anonymous) 2016-04-14 09:50 pm (UTC)(link)
I own a MSI GE62 6QD with a 6700HQ and had to totally disable CSTATES in bios to boot (even intel_idle.max_cstate=x).

Upgraded bios with ucode 0x84, kernel 4.4.6 couldn't boot at all.
Kernel 4.6.0rc3 boots and powertop tells me my system stays most of it's time in pc3 and cc7.

Package | Core | CPU 0 CPU 4
| | C0 active 2,2% 0,1%
| | POLL 0,0% 0,0 ms 0,0% 0,0 ms
| | C1E-SKL 0,2% 0,2 ms 0,0% 0,0 ms
C2 (pc2) 9,6% | |
C3 (pc3) 71,6% | C3 (cc3) 0,4% | C3-SKL 0,7% 0,6 ms 0,0% 0,0 ms
C6 (pc6) 0,0% | C6 (cc6) 8,6% | C6-SKL 9,1% 1,4 ms 0,0% 0,0 ms
C7 (pc7) 0,0% | C7 (cc7) 84,1% | C7s-SKL 74,9% 1,8 ms 3,5% 6,3 ms
C8 (pc8) 0,0% | |
C9 (pc9) 0,0% | |
C10 (pc10) 0,0% | | C10-SKL 10,5% 2,6 ms 94,9% 35,7 ms

Anyone knows how to go deeper than pc3 or is this a new artificial kernel limitation for skylake so linux can boot ?
Thanks :)

Another data point: Dell Latitude E7470 works (up to nvme)

(Anonymous) 2016-04-15 08:10 am (UTC)(link)
My new Latitude E7470 running Debian's 4.4.0-1 kernel enables power saving on everything except NVMe, reaches pc8 & cc7 & rc6 & C10-SKL according to powertop. It's new, so the battery is lasting a relatively long time depending on the load. Maybe almost two hours for heavy development & testing with a busy network, projected 6-7+ hours for light reading. -- jason@lovesgoodfood.com

Skylake - can't even get past PC2

(Anonymous) 2016-04-15 10:39 am (UTC)(link)
On an HP Spectre x360 it can't even go deeper than PC2.


Powertop & lspci: https://paste.xinu.at/m-fM194W/

Dell XPS 13 9350 (2016)

(Anonymous) 2016-04-15 07:05 pm (UTC)(link)
Running Fedora 24 GNOME, with 4.4.6-301-fc23.x86_64 kernel.
If i get it correctly, the value in this paste are fine? And the SSD controler also have the "LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+" value

http://pastebin.com/ZZ5bkscn

MSR

(Anonymous) 2016-04-15 08:27 pm (UTC)(link)
Well, the usual reason for not being able to enter PC states is that it is disabled by BIOS.

Check your MSR register 0xE2 to see if it is enabled (or disabled):

# rdmsr 0xE2
1e008006

In order to enable ALL PC states the bits [2:0] need to be set to 1.

So here I am out of luck as bits 1 and 0 are set to 0 (my machine will only enter PC2 max).

Another woe is that it is impossible to change the MSR in most cases - the register is locked by BIOS (bit 15).

In those cases the only option is either hacking bios (good luck with that!) or getting the vendor to update it. So no good options here.

Acer VN7-592g

(Anonymous) 2016-04-16 01:32 pm (UTC)(link)
I did some more experiments regarding reasons preventing my laptop from entering deep PC modes. I am attaching some notes, hope this may help others when debugging why their SKLs fail to enter deeper PC states.

Kernel: 4.6.0rc3 (4.5 fails to work)

After start: Enters only PC2
After first suspend: Goes to PC3 max (don't know why yet - firmware must be changing some configuration)

Factors discovered preventing entering deeper PC:

1. Kernel version - 4.6 kernel is needed
2. r8169 module for:

08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

Replaced with r8168 and works flawlessly.

3. pcie_aspm=force kernel parameter is needed (by default kernel won't enable ASPM - this is needed)

4. NVIDIA Optimus: We need to first load nouveau driver, then disable the card using bbswitch (optional). If no driver is loaded (or nvidia proprietary one), then it fails to enter deeper PC states.

I use:
modprobe nouveau
sleep 1
echo OFF > /proc/acpi/bbswitch

Are sure it's only Skylake? Seem to hit same bug on SandyBridge

(Anonymous) 2016-04-16 11:37 pm (UTC)(link)
I remember my SNB used to go in C7 last time I played with powertop (2~3 years ago).
But I rechecked and not anymore!

I am on Fedora 23, kernel 4.6-RC3, Lenovo X220 core i7-2620M (SNB)

i7z and powertop report the same : I do not get lower than C3 anymore...

Package | Core | CPU 0 CPU 1
| | C0 active 3.3% 4.5%
| | POLL 0.0% 0.6 ms 0.0% 0.3 ms
| | C1E-SNB 1.0% 0.3 ms 0.4% 0.3 ms
C2 (pc2) 1.1% | |
C3 (pc3) 67.1% | C3 (cc3) 85.2% | C3-SNB 92.7% 2.2 ms 91.5% 3.9 ms
C6 (pc6) 0.0% | C6 (cc6) 0.0% | C6-SNB 0.0% 0.0 ms 0.0% 0.0 ms
C7 (pc7) 0.0% | C7 (cc7) 0.0% | C7-SNB 0.0% 0.0 ms 0.0% 0.0 ms

| Core | CPU 2 CPU 3
| | C0 active 3.9% 9.7%
| | POLL 0.0% 0.0 ms 0.0% 0.0 ms
| | C1E-SNB 0.5% 0.4 ms 0.0% 0.1 ms
| |
| C3 (cc3) 77.3% | C3-SNB 91.9% 3.3 ms 84.3% 8.8 ms
| C6 (cc6) 0.0% | C6-SNB 0.0% 0.0 ms 0.0% 0.0 ms
| C7 (cc7) 0.0% | C7-SNB 0.0% 0.0 ms 0.0% 0.0 ms

| GPU |
| |
| Powered On 0.7% |
| RC6 0.0% |
| RC6p 0.0% |
| RC6pp 99.3% |
| |
| |

Page 1 of 2