Matthew Garrett ([personal profile] mjg59) wrote2016-04-13 12:46 pm
Entry tags:

Skylake's power management under Linux is dreadful and you shouldn't buy one until it's fixed

(Edit to add: this issue is restricted to the mobile SKUs. Desktop parts have very different power management behaviour)

Linux 4.5 seems to have got Intel's Skylake platform (ie, 6th-generation Core CPUs) to the point where graphics work pretty reliably, which is great progress (4.4 tended to lose all my windows every so often, especially over suspend/resume). I'm even running Wayland happily. Unfortunately one of the reasons I have a laptop is that I want to be able to do things like use it on battery, and power consumption's an important part of that. Skylake continues the trend from Haswell of moving to an SoC-type model where clock and power domains are shared between components that were previously entirely independent, and so you can't enter deep power saving states unless multiple components all have the correct power management configuration. On Haswell/Broadwell this manifested in the form of Serial ATA link power management being involved in preventing the package from going into deep power saving states - setting that up correctly resulted in a reduction in full-system power consumption of about 40%[1].

I've now got a Skylake platform with a nice shiny NVMe device, so Serial ATA policy isn't relevant (the platform doesn't even expose a SATA controller). The deepest power saving state I can get into is PC3, despite Skylake supporting PC8 - so I'm probably consuming about 40% more power than I should be. And nobody seems to know what needs to be done to fix this. I've found no public documentation on the power management dependencies on Skylake. Turning on everything in Powertop doesn't improve anything. My battery life is pretty poor and the system is pretty warm.

The best thing about this is the following statement from page 64 of the 6th Generation Intel ® Processor Datasheet for U-Platforms:

Caution: Long term reliability cannot be assured unless all the Low-Power Idle States are enabled.

which is pretty concerning. Without support for states deeper than PC3, Linux is running in a configuration that Intel imply may trigger premature failure. That's obviously not good. Until this situation is improved, you probably shouldn't buy any Skylake systems if you're planning on running Linux.

[1] These patches never went upstream. Someone reported that they resulted in their SSD throwing errors and I couldn't find anybody with deeper levels of SATA experience who was interested in working on the problem. Intel's AHCI drivers for Windows do the right thing, but I couldn't find anybody at Intel who could get any information from their Windows driver team.

Re: Actually been fixed.

[personal profile] gourdcaptain 2016-04-13 10:58 pm (UTC)(link)
Huh. Haven't had that issue either (and I've used this system for long stretches of moderate use in the week and a half I've had it). Have had random ACPI related crashes at boot (~50% of the time)unless I increase the wait time in Systemd-boot to ten seconds, weirdly enough, but that's more of a lousy BIOS/UEFI issue (given that on successful boots, it logs a bunch of ACPI table errors in dmesg). (Still trying to figure out how to report that given the kernel panic messages highly vary between crashes, scroll mostly off the screen, and the system completely freezes up after one without letting me use any of the stuff I read about online to capture it.)

Re: Actually been fixed.

(Anonymous) 2016-04-14 01:08 am (UTC)(link)
This actually reeks of a bunch of microcode and firmware issues that got fixed in the last months. Ensure you have microcode 0x73 or later, that's actually a good hint both the microcode and the PCH firmware are not crash-prone buggy crap.

As far as I am concerned, the kernel should refuse to boot on any Skylake box with a BIOS older than 2016 or running a microcode revision earlier than 0x73. That would certainly be a lot more truthful to everyone involved.

If an UEFI update is not available yet from your vendor, ask for your money back. A properly up-to-date UEFI for Skylake with SGX support will have microcode 0x83 or higher. If it has SGX support permanently disabled by UEFI, 0x76 is enough.

Re: Actually been fixed.

[personal profile] gourdcaptain 2016-04-14 03:13 am (UTC)(link)
0x74 microcode, released 3/15/16, and SGX disabled. Unfortunately, I can't flash newer UEFI if they put one out (still the most recent for it as of this posting) because the updater is Windows only (although I did flash the most recent one before wiping the drive, and have a Clonezilla backup of the Windows install if I absolutely have to). At least hopefully there'll eventually be microcode files I can early boot load.

EDIT: At least this is less bad than when I got a Broadwell i7 5700hq laptop last year and the microcode-based TSX issues were so bad I could only boot Fedora 22 for a month stably (it would crash under any load) until MSI (the ones I bought it from) were the first out with a fixed microcode update. (And their updater actually works from the UEFI loading off a USB stick). Intel's just awful anymore, but not like we really have any alternatives, given how bad AMD CPUs are for a lot of things anymore.

EDIT: Seriously, TSX was properly disabled under Haswell for a year at that point! Why was Broadwell shipping with it enabled and faulty? Did they not even check?
Edited 2016-04-14 03:23 (UTC)

EFI updates

(Anonymous) 2016-04-14 07:37 am (UTC)(link)
FYI: Since EFI can directly run Portable Executables (.exe) you can just drop the .exe from your vendor in your EFI system partition and run it from the EFI menu, no need to boot windows. I've done this on my Dell XPS 13 system multiple times now.

Re: EFI updates

[personal profile] gourdcaptain 2016-04-14 07:42 am (UTC)(link)
That's a thing you can do? I've been digging everywhere on ways to install UEFI updates on this thing, and it hasn't come up. Not that I'm disbelieving you, it just seems amazingly poorly documented. And it's the same EXE update files they have for Windows?

Re: EFI updates

[personal profile] gourdcaptain 2016-04-14 09:19 am (UTC)(link)
Ah then that's probably not going to fly with the Lenovo ones. (In my defender for buying it, finding good 11 inch laptops these days is hard- netbooks have mostly died out in favor of tablets and such. Plus, I needed Skylake for hardware HEVC decoding since nothing in the relatively cheap laptop range is going to have a discrete card that can do that, and 1080p HEVC takes a fair chunk of CPU to decode.)

Re: EFI updates

(Anonymous) 2016-04-15 06:34 am (UTC)(link)
Lenovo provides CD images, that boots and flash firmware without windows.
https://download.lenovo.com/pccbbs/mobiles/n1gur08w.txt

Re: EFI updates

[personal profile] gourdcaptain 2016-04-15 07:29 am (UTC)(link)
Yeah, that's for a model that cost twice as much as the dinky little thing I'm using right now. I'm not seeing one listed for the Yoga 700 11-inch: http://support.lenovo.com/us/en/products/Laptops-and-netbooks/Yoga-Series/yoga-700-11isk?linkTrack=Homepage:Body_Search%20Products&beta=false

Re: EFI updates

[personal profile] gourdcaptain 2016-05-01 07:43 am (UTC)(link)
Just to make google searches for this a bit more helpful, some experimentation later seems to link these random boot crashes to the i2c bus going funky when a bunch of stuff tries to hit it all at once during boot: https://bugzilla.kernel.org/show_bug.cgi?id=105251

Doing a weird workaround involving delaying loading hid_multitouch seems to cut down the boot failures by a fair amount: https://bugzilla.redhat.com/show_bug.cgi?id=1297188#c13
(NOTE: Still seem to happen a fair amount, did some statistical testing but still might be the placebo effect.)
Edited 2016-05-09 20:36 (UTC)

Re: EFI updates

[personal profile] gourdcaptain 2016-05-09 08:38 pm (UTC)(link)
There was a new UEFI update for the laptop released a few days ago. Windows install only. So after going to the trouble of installing Windows 10 to an external USB drive to boot off of and run the update (I really didn't feel like repartitioning the drive or having to image and restore it) that actualy worked to do the update.

But it's still 0x74 microcode. (facepalm)

Re: EFI updates

[personal profile] mikeymop 2016-04-14 07:27 pm (UTC)(link)
On the XPS 13, how are you dropping into the efi shell to execute these?
I always imagined they'd have windows software that would hamper this, even if it can run .exe

can you write/share a relevant tutorial for this process, as I will need to do this until Dell publishes the files for fwupdate.

Re: EFI updates

(Anonymous) 2016-04-14 07:47 pm (UTC)(link)
in this generation:
xps 9350: https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa
precision 5510: https://secure-lvfs.rhcloud.com/lvfs/device/124c207d-5db8-4d95-bd31-34fd971b34f9

Otherwise put the .EXE from support.dell.com on a FAT32 USB key or on the ESP and select flash BIOS from the F12 POST menu.

Re: EFI updates

(Anonymous) 2016-04-18 07:56 pm (UTC)(link)
Thank you,

I found instructions for flashing the .exe, but you gave me .cab files. How can I flash .cabs?

Can still do the .exe method on the 9350?

Re: EFI updates

[personal profile] mikeymop 2016-04-14 07:31 pm (UTC)(link)
I found what he's talking about. It's on the Arch wiki for the curious

http://hgdev.co/install-bios-update-under-linux-on-the-dell-xps-13-9343-2015/

I'll have to try this on my 9350 when I get it tomorrow.
Does anyone know how I can check the microcode version?

Re: EFI updates

(Anonymous) 2016-04-14 07:46 pm (UTC)(link)
9350 has updates at LVFS that can be applied as capsules.
https://secure-lvfs.rhcloud.com/lvfs/device/33773727-8ee7-4d81-9fa0-57e8d889e1fa
kensey: (Default)

Re: Actually been fixed.

[personal profile] kensey 2016-04-15 07:38 pm (UTC)(link)

At least this is less bad than when I got a Broadwell i7 5700hq laptop last year and the microcode-based TSX issues were so bad I could only boot Fedora 22 for a month stably (it would crash under any load) until MSI (the ones I bought it from) were the first out with a fixed microcode update.

Funny you should mention -- my Sager work laptop has an i7-5700HQ, and Fedora 22 runs fine on it, but 23 crashes within seconds to minutes. For now I'm just continuing to run F22, but I also can't run any VMs or containers that contain a libc that tickles the TSX issue or my laptop reboots!

Eventually I'll have to buckle down and figure out how to apply one of the firmware updates floating around out there that supposedly fix this (I think actually the one most commonly used came from MSI's updater), because neither Sager nor Clevo (the hardware OEM) has put out any firmware updates for it, and I don't want to be stuck running F22 past its end-of-support.

Re: Actually been fixed.

[personal profile] gourdcaptain 2016-04-15 09:31 pm (UTC)(link)
Yeah, I had that issue, although luckily I had the MSI laptop those updates came from.
https://github.com/bgw/bdw-ucode-update-tool - Someone's attempt to hack together an updater for those.
Unfortunately, all my experience with messing with microcode packages is on Arch where I can just stick it in my systemd-boot config as another initrd before the main one. Which I know you can do with GRUB as well, it's just GRUB's config files are hilariously complicated, IMHO.

Honestly, the lack of updates is a shame upon your hardware vendor, given that it even affects things under Windows - apparently Office 2016's installer, even.

EDIT: Nothing against Fedora, but I had to get off it as soon as possible because while I like a lot of the stuff it does as a distro, nobody'd packaged Bumblebee and CUDA in a way where you could get both on the same system (and I needed both at the time for work urgently) - all the CUDA packages had a hard dependency on a normal NVIDIA driver install. (Primarily because the guy doing it sees Bumblebee as a "dirty hack" that shouldn't be supported. Okay, buddy, you got any other options for making this hardware work in the meantime?) Arch is the only distro I've found which DOESN'T have a NVIDIA driver dependency for the CUDA package, which is pretty handy for being able to run the CUDA debugger on a laptop remotely connected to your system with an NVIDIA card.
Edited 2016-04-15 21:36 (UTC)